Sunteți pe pagina 1din 255

hi

hi

Table of Contents
Programming with the SCO OpenServer system shell...................................................................................1
Shell command language.........................................................................................................................1
Filename generation..........................................................................................................................2
Special characters..............................................................................................................................5
Input and output redirection..............................................................................................................8
Executing, stopping and restarting processes..................................................................................13
Command language exercises................................................................................................................18
Answers...........................................................................................................................................19
Shell programming................................................................................................................................20
Shell programs.................................................................................................................................20
Variables..........................................................................................................................................22
Shell programming constructs.........................................................................................................29
Functions.........................................................................................................................................40
Debugging programs.......................................................................................................................42
Modifying your login environment........................................................................................................44
Adding commands to your .profile..................................................................................................44
Setting terminal options...................................................................................................................45
Using shell variables.......................................................................................................................45
Shell programming exercises.................................................................................................................47
Answers...........................................................................................................................................48
Summary of shell command language...................................................................................................51
The vocabulary of shell command language...................................................................................51
Shell programming constructs.........................................................................................................54
Programming with awk....................................................................................................................................56
Basic awk...............................................................................................................................................56
Program structure............................................................................................................................56
Usage...............................................................................................................................................57
Fields...............................................................................................................................................57
Printing............................................................................................................................................58
Formatted printing...........................................................................................................................59
Simple patterns................................................................................................................................60
Simple actions.................................................................................................................................61
A handful of useful oneliners........................................................................................................61
Error messages................................................................................................................................63
Patterns...................................................................................................................................................63
BEGIN and END.............................................................................................................................63
Relational expressions.....................................................................................................................64
Extended regular expressions..........................................................................................................65
Combinations of patterns.................................................................................................................67
Pattern ranges..................................................................................................................................68
Actions...................................................................................................................................................68
Builtin variables............................................................................................................................68
Arithmetic........................................................................................................................................69
Strings and string functions.............................................................................................................71
Field variables.................................................................................................................................74
Number or string?............................................................................................................................74
Control flow statements...................................................................................................................76
i

hi

Table of Contents
Programming with awk
Arrays..............................................................................................................................................77
Userdefined functions...................................................................................................................79
Some lexical conventions................................................................................................................80
Output....................................................................................................................................................80
The print statement..........................................................................................................................80
Output separators.............................................................................................................................80
The printf statement.........................................................................................................................81
Output to files..................................................................................................................................82
Output to pipes................................................................................................................................82
Input.......................................................................................................................................................83
Files and pipes.................................................................................................................................83
Input separators...............................................................................................................................84
Multiline records...........................................................................................................................84
The getline function.........................................................................................................................84
Commandline arguments..............................................................................................................86
Using awk with other commands and the shell.....................................................................................87
The system function........................................................................................................................87
Cooperation with the shell...............................................................................................................87
Example applications.............................................................................................................................89
Generating reports...........................................................................................................................89
Word frequencies............................................................................................................................90
Accumulation..................................................................................................................................90
Random choice................................................................................................................................91
History facility.................................................................................................................................91
Formletter generation....................................................................................................................91
awk summary.........................................................................................................................................92
Command line.................................................................................................................................92
Patterns............................................................................................................................................92
Control flow statements...................................................................................................................92
Inputoutput....................................................................................................................................93
Functions.........................................................................................................................................93
String functions...............................................................................................................................93
Arithmetic functions........................................................................................................................93
Operators (increasing precedence)..................................................................................................94
Regular expressions (increasing precedence)..................................................................................94
Builtin variables............................................................................................................................95
Limits...............................................................................................................................................95
Initialization, comparison, and type coercion.................................................................................95
Lexical analysis with lex...................................................................................................................................98
Generating a lexical analyzer program..................................................................................................98
Writing lex source..................................................................................................................................99
The fundamentals of lex rules.......................................................................................................100
Advanced lex usage.......................................................................................................................103
Using lex with yacc..............................................................................................................................110
Miscellaneous......................................................................................................................................112
Summary of source format...................................................................................................................112
ii

hi

Table of Contents
Parsing with yacc............................................................................................................................................114
Basic specifications..............................................................................................................................115
Actions...........................................................................................................................................117
Lexical analysis.............................................................................................................................119
Parser operation...................................................................................................................................120
Ambiguity and conflicts.......................................................................................................................124
Precedence...........................................................................................................................................128
Error handling......................................................................................................................................131
The yacc environment..........................................................................................................................133
Hints for preparing specifications........................................................................................................134
Input style......................................................................................................................................134
Left recursion................................................................................................................................134
Lexical tieins...............................................................................................................................135
Reserved words.............................................................................................................................136
Advanced topics...................................................................................................................................136
Simulating error and accept in actions..........................................................................................136
Accessing values in enclosing rules..............................................................................................136
Support for arbitrary value types...................................................................................................137
yacc input syntax...........................................................................................................................138
A simple example................................................................................................................................139
An advanced example..........................................................................................................................140
Managing file interactions with make...........................................................................................................144
Basic features.......................................................................................................................................144
Parallel make.................................................................................................................................147
Description files and substitutions.......................................................................................................148
Comments......................................................................................................................................148
Continuation lines..........................................................................................................................148
Macro definitions..........................................................................................................................148
General form..................................................................................................................................148
Dependency information...............................................................................................................148
Executable commands...................................................................................................................149
Extensions of $, $@, and $<.........................................................................................................149
Output translations........................................................................................................................149
Recursive makefiles......................................................................................................................150
Suffixes and transformation rules..................................................................................................150
Implicit rules..................................................................................................................................150
Archive libraries............................................................................................................................153
Source code control system file names.........................................................................................154
The null suffix...............................................................................................................................155
Included files.................................................................................................................................156
SCCS makefiles.............................................................................................................................156
Dynamic dependency parameters..................................................................................................156
The make command.............................................................................................................................156
Environment variables.........................................................................................................................158
Suggestions and warnings....................................................................................................................159
Internal rules........................................................................................................................................159

iii

hi

Table of Contents
Tracking versions with SCCS........................................................................................................................162
Basic usage...........................................................................................................................................162
Terminology..................................................................................................................................162
Creating an SCCS file with admin................................................................................................162
Retrieving a file with get...............................................................................................................163
Recording changes with delta........................................................................................................164
More on get...................................................................................................................................164
The help command........................................................................................................................165
Delta numbering...................................................................................................................................165
SCCS command conventions...............................................................................................................167
x.files and z.files............................................................................................................................167
Error messages..............................................................................................................................168
SCCS commands.................................................................................................................................168
get..................................................................................................................................................169
delta...............................................................................................................................................176
admin.............................................................................................................................................178
prs..................................................................................................................................................180
sact.................................................................................................................................................181
help................................................................................................................................................181
rmdel..............................................................................................................................................182
cdc.................................................................................................................................................182
what...............................................................................................................................................183
sccsdiff...........................................................................................................................................183
comb..............................................................................................................................................184
val..................................................................................................................................................184
SCCS files............................................................................................................................................185
Protection.......................................................................................................................................185
Formatting.....................................................................................................................................186
Auditing.........................................................................................................................................186
Packaging your software applications..........................................................................................................188
Contents of a package..........................................................................................................................188
Required components....................................................................................................................189
Optional package information files...............................................................................................189
Optional installation scripts...........................................................................................................190
Quick steps to packaging.....................................................................................................................190
Quick steps to network installation......................................................................................................194
Network installation from the command line................................................................................196
Network installation from the graphical interface.........................................................................196
The structural life cycle of a package..................................................................................................197
The package creation tools...................................................................................................................197
pkgmk............................................................................................................................................197
pkgtrans.........................................................................................................................................198
pkgproto.........................................................................................................................................198
The installation tools............................................................................................................................198
The package information files.............................................................................................................199
pkginfo...........................................................................................................................................199
prototype........................................................................................................................................200
iv

hi

Table of Contents
Packaging your software applications
compver.........................................................................................................................................202
copyright........................................................................................................................................203
depend...........................................................................................................................................203
space..............................................................................................................................................204
pkgmap..........................................................................................................................................204
The installation scripts.........................................................................................................................205
Script processing...........................................................................................................................205
Installation parameters..................................................................................................................206
Getting package information for a script.......................................................................................207
Exit codes for scripts.....................................................................................................................207
The request script..........................................................................................................................207
The class action script...................................................................................................................209
The special system classes............................................................................................................211
The procedure script......................................................................................................................213
Basic steps of packaging......................................................................................................................214
1. Assigning a package abbreviation.............................................................................................215
2. Defining a package instance......................................................................................................215
3. Placing objects into classes.......................................................................................................217
4. Making package objects relocatable..........................................................................................217
5. Writing your installation scripts................................................................................................218
6. Defining package dependencies................................................................................................219
7. Writing a copyright message.....................................................................................................219
8. Creating the pkginfo file............................................................................................................220
9. Creating the prototype file.........................................................................................................220
10. Distributing packages over multiple volumes.........................................................................224
11. Creating a package with pkgmk..............................................................................................224
12. Creating a package with pkgtrans...........................................................................................225
Set packaging.......................................................................................................................................226
Set installation...............................................................................................................................227
Set removal....................................................................................................................................228
Set information display.................................................................................................................228
The setsize file...............................................................................................................................228
The setsizecvt command...............................................................................................................229
Quick reference to packaging procedures............................................................................................229
Case studies of package installation....................................................................................................231
1. Selective installation..................................................................................................................231
2. Device driver installation..........................................................................................................234
3. Create an installation database..................................................................................................234
4. Define package compatibilities and dependencies....................................................................236
5a. Modify an existing file using the sed class..............................................................................238
5b. Modify an existing file using a class action script..................................................................240
5c. Modify an existing file using the build class...........................................................................242
6. Modify crontab files during installation....................................................................................243
7a. Create a Set Installation Package.............................................................................................245
7b. Split one set into two...............................................................................................................248

Programming with the SCO OpenServer system


shell
This topic shows you how the SCO OpenServer system shell can help you do routine tasks. For example, it
tells you how to use the shell to manage your files, to manipulate file contents, and to group commands
together in programs.
The topic is organized in two major sections:
``Shell command language'' describes the use of the shell as a command interpreter. It tells you how
to use shell commands and characters with special meanings to manage files, redirect standard input
and output, and execute and terminate processes.
``Shell programming'', details the use of the shell as a programming language. It tells you how to
create, execute, and debug programs made up of commands, variables, and programming constructs
such as loops and case statements. Finally, it tells you how to modify your login environment.
To get the most benefit from this tutorial you should log into your SCO OpenServer system and recreate the
examples as you read the text.
Exercises are provided for the Shell Command Language and Shell Programming. Answers are listed at the
end of each section.
NOTE: Your SCO OpenServer system might not have all the commands referenced in this section. If you
cannot access a command, check with your system administrator to find out whether it is available.

Shell command language


This section introduces commands and, more importantly, special characters that let you
find and manipulate a group of files by using pattern matching
run a command in the background or at a specified time
run a group of commands sequentially
redirect standard input and output (of files and other commands)
terminate running programs
``Characters with special meanings in the shell language'' summarizes the characters that have special
meanings in the shell.
Characters with special meanings in the shell language
Character Function
?[]
The asterisk, question mark, and brackets allow you to specify filenames by pattern matching.
&
The ampersand places commands in background mode, leaving your terminal free for other
tasks.
;
The semicolon separates multiple commands on one command line.
Programming with the SCO OpenServer system shell

Programming with the SCO OpenServer system shell


\
'...'
"..."
>
<
>>
|
`...`
$

The backslash turns off the meaning of special characters such as ? [ ] & ; > < and |.
Single quotes turn off the delimiting meaning of a space and the special meaning of all special
characters.
Double quotes turn off the delimiting meaning of a space and the special meaning of all special
characters except $ and `.
The greater than sign redirects the output of a command into a file (replacing the existing
contents).
The less than sign redirects the input for a command to come from a file.
Two greater than signs redirect the output of a command to be added to the end of an existing
file.
The vertical bar, or pipe, makes the output of one command the input of another command.
A pair of grave accents around a command embedded on a command line makes the output of
the embedded command an argument on the larger command line.
The dollar sign retrieves the value of positional parameters and userdefined variables. It is also
the default shell prompt.

Filename generation
The shell recognizes three of the special characters listed in ``Characters with special meanings in the shell
language'' the asterisk (*), the question mark (?), and the set of brackets ([ ])as symbols for patterns that
are parts of filenames. By substituting one or more of these characters for the name (or partial name) of an
existing file (or group of files), you can reduce the amount of typing you must do to specify filenames on a
command line.
The process by which the shell interprets these characters as the full filenames they represent is known as
filename expansion. File name expansion is a useful mechanism when you want to specify many files on a
single command line. For example, you might want to print a group of files containing records for the month
of December, all of which begin with the letters dec. By using one of these special characters to represent the
parts of the filenames that vary, you can type one print command and specify all the files that begin with dec,
thus avoiding the need to type the full names of all the desired files on the command line.
This section explains how to use the asterisk, question mark, and brackets for filename expansion.
Matching all characters with the asterisk
The asterisk (*) matches any string of characters, including a null (empty) string. You can use the * to specify
a full or partial filename. The * alone matches all the file and directory names in the current directory, except
those starting with a . (dot). To see the effect of the *, try it as an argument to the echo(C) command. Type:
echo *

The echo command displays its arguments on your screen. Notice that the system response to echo * is a
listing of all the filenames in your current directory.
``Summary of filename generation characters'' summarizes the syntax and capabilities of the echo command.
CAUTION: The * is a character that matches everything. For example, if you type rm * you will erase all the
files in your current directory. Be very careful how you use the asterisk!
Filename generation

Programming with the SCO OpenServer system shell


For another example, say you have written several reports and have named them report, report1, report1a,
report1b.01, report25, and report316. By typing report1* you can refer to all files that are part of report1,
collectively. To find out how many reports you have written, you can use the ls command to list all files that
begin with the string report, as shown in the following example.
$ ls report*
report report1 report1a report1b.01 report25 report316
$

The * matches any characters after the string report, including no letters at all. Notice that * matches the files
in numerical and alphabetical order. A quick and easy way to display the contents of your report files in order
on your screen is by typing the following command:
pr report*

Now try another exercise. Suppose you have a current directory called appraisals that contains files called
Andrew_Adams, Paul_Lang, Jane_Peters, and Fran_Smith, choose a character that all the filenames in your
directory have in common, such as a lowercase ``a.'' Then request a listing of those files by referring to that
character. For example, if you choose a lowercase ``a,'' type the following command line:
ls *a*

The system responds by printing the names of all the files in your current directory that contain a lowercase
``a.''
The ``*'' can represent characters in any part of a filename. For example, if you know the first and last letters
are the same in several files, you can request a list of them on that basis. If, for example, you had a directory
containing files named FATE, FE, FADED_LINE, F123E, Fig3.4E, FIRE_LANE, FINE_LINE,
FREE_ENTRY, and FAST_LANE, you could use this command to obtain a list of files starting with ``F''
and ending with ``E.'' For such a request, your command line might look like this:
ls F*E

The system response will be a list of filenames that begin with F, end with E, and are in the following order:
F123E
FADED_LINE
FAST_LANE
FATE
FE
FINE_LINE
FIRE_LANE
Fig3.4E

The order is determined by the collating sequences of the language being used, in this case, English: (1)
numbers, (2) uppercase letters, (3) lowercase letters.
The ``*'' is even more powerful; it can help you find all files named memo in any directory one level below the
current directory:
ls */memo

Filename generation

Programming with the SCO OpenServer system shell


Matching one character with the question mark
The question mark (?) matches any single character of a filename except a leading period (.). Let us suppose
you have written several chapters in a book that has 12 chapters, and you want a list of those you have
finished through Chapter 9. If your directory contains the following files:
chapter1
chapter2
chapter5
chapter9
chapter11

use the ls command with the ``?'' to list all chapters that begin with the string ``chapter'' and end with any
single character, as shown below:
$ ls chapter?
chapter1 chapter2 chapter5 chapter9
$

The system responds by printing a list of all filenames that match.


Although ``?'' matches any one character, you can use it more than once in a filename. To list the rest of the
chapters in your book, type:
ls chapter??

Of course, if you want to list all the chapters in the current directory, use the ``*'' (asterisk):
ls chapter*

Matching one of a set with brackets


Use brackets ([ ]) when you want the shell to match any one of several possible characters that may appear in
one position in the filename. Suppose your directory contains the following files: cat, fat, mat, rat. If you
include [crf] as part of a filename pattern, the shell will look for filenames that have the letter ``c,'' ``r,'' or ``f''
in the specified position, as the following example shows.
$ ls [crf]at
cat fat rat
$

This command displays all filenames that begin with the letter ``c,'' ``r,'' or ``f,'' and end with the letters ``at.''
Characters that can be grouped within brackets in this way are collectively called a ``character class.''
Brackets can also be used to specify a range of characters, whether numbers or letters. Suppose you have a
directory containing the following files: chapter1, chapter2, chapter3, chapter4, chapter5 and chapter6. If
you specify
chapter[15]

the shell will match the files named chapter1 through chapter5. This is an easy way to handle only a few
chapters at a time.
Filename generation

Programming with the SCO OpenServer system shell


Try the pr command with an argument in brackets:
$ pr chapter[24]

This command displays the contents of chapter2, chapter3, and chapter4, in that order, on your terminal.
A character class may also specify a range of letters. If you specify [AZ], the shell will look only for
uppercase letters; if [az], only lowercase letters.
The functions of these special characters are summarized in ``Summary of filename generation characters''.
Try to use them on the files in your current directory.
Summary of filename generation characters
Character Function
Match any string of characters (including an empty, or null string) except a leading period.
?
Match any single character, except a leading period.
[xyz]
Match one of the characters specified within the brackets.
[az]
Match one of the range of characters specified.

Special characters
The shell language has other special characters that perform a variety of useful functions. Some of these
additional special characters are discussed in this section; others are described in ``Input and output
redirection''.
Running a command in background with the ampersand
Some shell commands take a long time to execute. The ampersand (``&'') is used to execute commands in
background mode, thus freeing your terminal for other tasks. The general format for running a command in
background mode is
command &

NOTE: You should not run interactive shell commands, such as read, in the background.
In the example below, the shell is performing a long search in background mode. Specifically, the grep(C)
command is searching for the string ``delinquent'' in the file accounts. Notice the ``&'' is the last character of
the command line:
$ grep delinquent accounts &
21940
$

When you run a command in the background, the SCO OpenServer system outputs a process number; 21940
is the process number associated with the grep command in the example. You can use this number to
terminate the execution of a background command. (Stopping the execution of processes is discussed in
``Executing, stopping and restarting processes''.) The prompt on the last line means that the terminal is free
and waiting for your commands; grep has started running in background mode.
Special characters

Programming with the SCO OpenServer system shell


Running a command in background mode affects only the availability of your terminal; it does not affect the
output of the command. Whether or not a command is run in background, it prints its output on your terminal
screen, unless you redirect it to a file. (See ``Redirecting output with the > sign'' for details.)
If you want a command to continue running in background after you log out, you can execute it with the
nohup(C) command. (This is discussed in ``Using the nohup command''.)
Executing commands sequentially with the semicolon
You can type two or more commands on one line as long as each is separated by a semicolon (``;'') or an
ampersand (``&''), as follows:
command1; command2; command3

The SCO OpenServer system executes the commands in the order that they appear in the line and prints all
output on the screen. This process is called sequential execution.
Try this exercise to see how the ``;'' works. First, type:
cd; pwd; ls

The shell executes these commands sequentially:


1. cd changes your location to your login directory
2. pwd prints the full pathname of your current directory
3. ls lists the files in your current directory
If you want to save the system responses to these commands, (or prevent them from appearing on your
screen), you can redirect them to a file. See ``Input and output redirection'' for instructions.
Turning off special meanings with the backslash
The shell interprets the backslash (\) as an escape character that allows you to turn off any special meaning of
the character immediately after it. To see how this works, try the following exercise. Create a twoline file
called trial that contains the following text:
The all * game

was held in Summit.

Use the grep command to search for the asterisk in the file, as shown in the following example:
$ grep \* trial
The all * game
$

The grep command finds the * in the text and displays the line in which it appears. Without the \ (backslash),
the * would be expanded by the shell to match all filenames in the current directory.

Special characters

Programming with the SCO OpenServer system shell


Turning off special meanings with quotation marks
Another way to escape the meaning of a special character is to use quotation marks. Single quotes (' . . . ')
turn off the special meaning of any character except single quotes. Double quotes (``"'' . . . ``"'') turn off the
special meaning of all characters except double quotes, the ``$'' and the ` (grave accent), which retain their
special meanings within double quotes. An advantage of using quotes is that numerous special characters can
be enclosed in the quotes; this can be more concise than using the backslash.
For example, if your filenamed trial also contained the line:
He really wondered why? Why???

you could use the grep command to match the line with the three question marks as follows:
$ grep '???' trial
He really wondered why? Why???
$

If you had instead entered the command


grep ??? trial

the three question marks without single quotes would have been used as special characters that match all
filenames in the current directory, if any, that consist of three characters. Another example of this is, if file
trial contained the line
trial

then
grep ????? trial

would find the string trial in the file trial.


Turning off the meaning of a space with quotes
Quotes, like backslashes, are commonly used as escape characters for turning off the special meaning of the
blank space. The shell interprets a space on a command line as a delimiter between the arguments of a
command. Both single and double quotes allow you to escape that meaning.
For example, to locate two or more words that appear together in text, make the words a single argument (to
the grep command) by enclosing them in quotes. To find the two words ``The all'' in your file trial, enter the
following command line:
$ grep The all trial
The all * game
$

grep finds the string The all and prints the line that contains it. What would happen if you did not put quotes
around that string?
The ability to escape the special meaning of a space is especially helpful when you're using the banner(C)
command. This command prints a message across a terminal screen in large, postersize letters.
Special characters

Programming with the SCO OpenServer system shell


To execute banner, specify a message consisting of one or more arguments (in this case, usually words),
separated on the command line by spaces. banner will use these spaces to delimit the arguments and print
each argument on a separate line.
To print more than one argument on the same line, enclose the words, together, in double quotes. For
example, to print a birthday greeting, type:
banner happy birthday to you

The command prints your message as a fourline banner. Now print the same message as a threeline banner.
Type:
banner happy birthday "to you"

Notice that the words to and you now appear on the same line. The space between them has lost its meaning
as a delimiter.

Input and output redirection


In the SCO OpenServer system, some commands expect to receive their input only from the keyboard
(standard input) and most commands display their output at the terminal (standard output). However, the SCO
OpenServer system lets you redirect both input and output to other files and programs. With such redirection,
you can tell the shell to
take its input from a file rather than from the keyboard
send its output to a file rather than to the terminal
use a program as the source of data for another program
To redirect input and output, you use a set of operators: the less than sign (``<''), the greater than sign (``>''),
two greater than signs (``>>''), and the pipe (``|'').
Redirecting input with the < sign
To redirect input, specify a filename after a less than sign (``<'') on a command line:
command < file

When is this mechanism useful? A typical example is when you want to send someonevia the mail
commanda message or file you've already created. By default, the mail command expects input from
standard input (that is, the keyboard). But suppose you have already entered the information to be sent (to a
user with the login name jim) in a file called report. Rather than retype that information, you can simply
redirect input to mail as follows:
mail jim < report

Redirecting output with the > sign


To redirect output (from standard output) to a file, specify a filename after the greater than sign (``>'') on a
command line:
Input and output redirection

Programming with the SCO OpenServer system shell


command > file
CAUTION: If you redirect output to a file that already exists, the output of your command will overwrite the
contents of the existing file.
Before redirecting the output of a command to a particular file, make sure a file by that name does not already
exist, unless you do not mind overwriting it. The shell does not allow you to have two files of the same name
in one directory. Therefore if you redirect the output of a command to a file with the same name as an existing
file, the shell will overwrite the contents of the existing file with the output of your command. Keep this in
mind when redirecting output; the shell does not warn you when it is about to overwrite a file.
To make sure that no file exists with the name you plan to use, run the ls command, specifying your proposed
filename as an argument. If a file with that name exists, ls will list it; if not, you will receive a message that
the file was not found in the current directory. For example, checking for the existence of the files temp and
junk would give you the following output:
$ ls temp
temp
$ ls junk
junk: no such file or directory
$

This means you can name your new output file junk, but you cannot name it temp unless you no longer want
the contents of the existing temp file.
Appending output to an existing file with the >> symbol
To keep from destroying an existing file, you can also use the double greater than symbol (``>>''), as follows:
command >> file
This appends the output of a command to the end of the file file. If file does not exist, it is created when you
use the ``>>'' symbol this way.
The following example shows how to append the output of the cat command, (described in ``Shell
programming'') to an existing file. The cat command prints the contents of the files to the standard output. If it
has no arguments, it prints its standard input to the standard output. First, the cat command is executed on
both files without output redirection to show their respective contents. Then the contents of trial2 are added
after the last line of trial1 by executing the cat command on trial2 and redirecting the output to trial1.
$ cat trial1
This is the first line of trial1.
Hello.
This is the last line of trial1.
$
$ cat trial2
This is the beginning of trial2.
Hello.
This is the end of trial2.
$
$ cat trial2 >> trial1
$ cat trial1
This is the first line of trial1.

Input and output redirection

Programming with the SCO OpenServer system shell


Hello.
This is the last line of trial1.
This is the beginning of trial2.
Hello.
This is the end of trial2.
$

Useful applications of output redirection


Redirecting output is useful when you do not want it to appear on your screen immediately or when you want
to save it. Output redirection is also especially useful when you run commands that perform clerical chores on
text files. Two such commands are spell and sort.
The spell command

The spell program compares every word in a file against its internal vocabulary list and prints a list of all
potential misspellings on the screen. If spell does not have a listing for a word (such as a person's name), it
will report that as a misspelling, too.
Running spell on a lengthy text file can take a long time and may produce a list of misspellings that is too
long to fit on your screen. spell prints all its output at once; if it does not fit on the screen, the command
scrolls it continuously off the top until it has all been displayed. A long list of misspellings will roll off your
screen quickly and may be difficult to read.
You can avoid this problem by redirecting the output of spell to a file. In the following example, spell
searches a filenamed memo and places a list of misspelled words in a filenamed misspell:
$ spell memo > misspell

See the spell(C) manual page for all available options and an explanation of the capabilities of each.
The sort command

The sort command arranges the lines of a specified file in alphabetical or numerical order Because users
generally want to keep a file that has been alphabetized, output redirection greatly enhances the value of the
sort command.
Be careful to choose a new name for the file that will receive the output of the sort command (the
alphabetized list). When sort is executed, the shell first empties the file that will accept the redirected output.
Then it performs the sort and places the output in the blank file. If you type
sort list > list

the shell will empty list and then sort nothing into list.
Combining background mode and output redirection
Running a command in background does not affect the output of the command; unless it is redirected, output
is always printed on the terminal screen. If you are using your terminal to perform other tasks while a
command runs in background, you will be interrupted when the command displays its output on your screen.
However, if you redirect that output to a file, you can work undisturbed, except when an error occurs.

Input and output redirection

10

Programming with the SCO OpenServer system shell


For example, in ``Special characters'', you learned how to execute the grep command in background with
``&''. Now suppose you want to find occurrences of the word ``test'' in a filenamed schedule. Run the grep
command in background and redirect its output to a file called testfile:
$ grep test schedule > testfile &

You can then use your terminal for other work and examine testfile when you have finished it.
Redirecting output to a command with the pipe
The ``|'' character is called a pipe. Pipes are powerful tools that allow you to take the output of one command
and use it as input for another command without creating temporary files. A multiple command line created in
this way is called a pipeline.
The general format for a pipeline is:
command1 | command2 | command3 . . .

The output of command1 is used as the input of command2. The output of command2 is then used as the
input for command3.
To understand the efficiency and power of a pipeline, consider the contrast between two methods that achieve
the same results.
To use the input/output redirection method, run one command and redirect its output to a temporary
file. Then run a second command that takes the contents of the temporary file as its input. Finally,
remove the temporary file after the second command has finished running.
To use the pipeline method, run one command and pipe its output directly into a second command.
For example, suppose you want to mail a happy birthday message in a banner to the owner of the login david.
Doing this without a pipeline is a threestep procedure. You must:
1. Enter the banner command and redirect its output to a temporary file:
banner happy birthday > message.tmp

2. Enter the mail command using message.tmp as its input:


mail david < message.tmp

3. Remove the temporary file:


rm message.tmp

However, by using a pipeline you can do this in one step:


banner happy birthday | mail david

A pipeline using the cut and date commands


The cut and date commands provide a good example of how pipelines can increase the versatility of
individual commands. The cut command allows you to extract part of each line in a file. It looks for
characters in a specified part of the line and prints them. To specify a position in a line, use the c option and
Input and output redirection

11

Programming with the SCO OpenServer system shell


identify the part of the file you want by the numbers of the spaces it occupies on the line, counting from the
lefthand margin.
For example, suppose you want to display only the dates from a file called birthdays. The file contains the
following list:
Anne
Klaus
Mary
Peter
Nandy
Sam

12/26
7/4
10/18
11/9
4/23
8/12

The birthdays appear between the ninth and thirteenth spaces on each line. To display them, type:
cut c213 birthdays

The output is shown below:


12/26
7/4
10/18
11/9
4/23
8/12

The cut command is usually executed on a file; however, piping makes it possible to run this command on the
output of other commands, too. This is useful if you want only part of the information generated by another
command. For example, you may want to have the time printed. The date command prints the day of the
week, date, and time, as follows:
$ date
Tue Dec 24 13:12:32 EST 1991
$

Notice that the time is given between spaces 12 and 19 of the line. You can display the time (without the date)
by piping the output of date into cut, specifying spaces 1219 with the c option. Your command line and its
output will look like this:
$ date | cut c1219
13:14:56
$

See the date(C) manual page for all available options and an explanation of the capabilities of each.
Substituting output for an argument
The output of most commands may be captured and used as arguments on a command line. Do this by
enclosing the command in grave accents (` . . . `) and placing it on the command line in the position where the
output should be treated as arguments. This is known as command substitution.
For example, you can substitute the output of the date and cut pipeline command used previously for the
argument in a banner printout by typing the following command line:

Input and output redirection

12

Programming with the SCO OpenServer system shell


$ banner `date | cut c1219`

Notice the results: the system prints a banner with the current time.
``Shell programming'' shows you how you can also use the output of a command line as the value of a
variable.

Executing, stopping and restarting processes


This section discusses how to:
run commands at a later time using the at and batch commands
run commands at regular intervals using the crontab command
obtain the status of running processes
terminate active processes
restart a stopped process
move processes between running in the foreground and the background
keep background processes running after you have logged out
Running commands at a later time with the batch and at commands
The batch and at commands allow you to specify a command or sequence of commands to be run at a later
time. With the batch command, the system determines when the commands run; with the at command, you
determine when the commands run. Both commands expect input from standard input (the terminal); the list
of commands entered as input from the terminal must be ended by pressing <CTRLd> (controld).
The batch command is useful if you are running a process or shell program that uses a large amount of system
time. The batch command submits a batch job (containing the commands to be executed) to the system. The
job is put in a queue, and runs when the system load falls to an acceptable level. This frees the system to
respond rapidly to other input and is a courtesy to other users.
The general format for batch is:
batch
first command
.
.
.
last command
<CTRLd>

If there is only one command to be run with batch, you can enter it as follows:
batch command_line

<CTRLd>

The next example uses batch to execute the grep command at a convenient time. Here grep searches all files
in the current directory and redirects the output to the file dol.file.
$ batch
grep dollar * > dol.file

Executing, stopping and restarting processes

13

Programming with the SCO OpenServer system shell


<CTRLd>
job 155223141.b at Tue Dec 3 11:14:54 1991
$

After you submit a job with batch, the system responds with a job number, date, and time. This job number is
not the same as the process number that the system generates when you run a command in the background.
The at command allows you to specify an exact time to execute the commands. The general format for the at
command is:
at time<
first command
.
.
.
last command
<CTRLd>

The time argument consists of the time of day and, if the date is not today, the date.
The following example shows how to use the at command to mail a happy birthday banner to the user with
the login name emily on her birthday:
$ at 8:15am Feb 27<
banner happy birthday | mail emily
<CTRLd>
job 453400603.a at Sat Feb 23 08:15:00 1991
$

Notice that the at command, like the batch command, responds with the job number, date, and time.
If you decide you do not want to execute the commands currently waiting in a batch or at job queue, you can
erase those jobs by using the r option of the at command with the job number or you can save the job
number by redirecting it. The general format is
at r job_number

Try erasing the previous at job for the happy birthday banner. Type:
at r 453400603.a

If you have forgotten the job number, the at l command will give you a list of the current jobs in the batch
or at queue, as the following screen shows:
$ at l
user = mylogin 168302040.a at Mon Nov 25 13:00:00 1991
user = mylogin 453400603.a at Sun Dec 08 08:15:00 1991
$

Notice that the system displays the job number and the time the job will run.
Using the at command, mail yourself the file memo at noon, to tell you it is lunch time. (You must redirect the
file into mail unless you use a ``here document,'' described in ``Shell programming''.) Then try the at
command with the l option.

Executing, stopping and restarting processes

14

Programming with the SCO OpenServer system shell


$ at 12:00pm
mail mylogin < memo
<CTRLd>
job 263131754.a at Jun 25 12:00:00 1991
$ at l
user = mylogin 263131754.a at Jun 25 12:00:00 1991
$

Executing processes at regular intervals


The crontab(C) command lets you execute routine jobs (called ``cron'' jobs) on a regular basis. For example,
it could be used periodically to back up your files, or to clean up tmp and log files. To submit a cron job,
details of the job must be added to a cronfile. This is a normal file, but its contents are formatted in a special
way:
Minutes

Hours

Day_of_Month

Month

Day_of_week

Command

Fields are separated by spaces or tabs. The file cannot have blank lines. The cronfile parameters are as
follows:
Field
Minutes
Hours
Day of month
Month of year
Day of week
Command

Allowable values
059
023
131
112
06 (0=Sunday)
any noninteractive command

A field can be a number, a range of numbers (for example 1020), a list of numbers separated by commas, or
an asterisk (all values). For example, an asterisk in the ``Hours'' field means ``every hour''; an asterisk in the
``Month'' field means ``every month''.
Let us assume that you want to write a cronfile to issue reminders and perform regular tasks:
You need to attend a meeting at 10A.M. every Monday, and you want to remind yourself of this at
9:45A.M. on Monday mornings.
You want to find and remove any old files beginning with ``#'' in your home directory at 4:30P.M. on
the first day of every month.
You want to echo the date and time to your terminal at 9:00A.M. Monday to Friday.
Create a file that looks like the following:
45
30
0

9
16
9

*
1
*

*
*
*

1
*
15

echo "Weekly status meeting" > /dev/tty06


find $HOME name '#*' atime +3 exec rm f {} \;
echo date > /dev/tty06

When you have created your file (called cronfile), submit it by typing the following:
$ crontab cronfile

Executing, stopping and restarting processes

15

Programming with the SCO OpenServer system shell


If you want to edit an existing cron job, the cronfile should be edited and resubmitted.
To display the current cron job, type crontab l. Redirect the output to a file and edit it, then resubmit the
new file. This replaces the old cron job.
To remove the current cron job, type crontab r. If you submit a second cronfile before the first one is
executed, the first one will be overwritten by the second.
As with at and batch, access to crontab can be turned on and off by the root user by adding user names to
/usr/lib/cron/cron.allow and /usr/lib/cron/cron.deny respectively.
Obtaining the status of running processes
The ps command gives you the status of all the processes currently being run. For example, you can use the
ps command to show the status of all the processes you run in the background mode using ``&'' (described in
``Special characters'').
The next section, ``Terminating active processes'', discusses how you can use the PID (process identification)
number to stop a command from executing. A PID is a unique number that the SCO OpenServer system
assigns to each active process.
In the following example, grep is run in the background, and then the ps command is issued. The system
responds with the process identification (PID) and the terminal identification (TTY) number. It also gives the
cumulative execution time for each process (TIME), and the name of the command that is being executed
(COMMAND).
$ grep word * > temp &
28223
$ ps
PID TTY
TIME
COMMAND
28124
tty10
0:00
sh
28223
tty10
0:04
grep
28224
tty10
0:04
ps
$

Notice that the system reports a PID number for the grep command, as well as for the other processes that are
running: the ps command itself, and the sh (shell) command that runs throughout the time you are logged in.
(The shell program sh interpretsthat is, passes on to the computershell commands.)
See the ps (C) manual page for all available options and an explanation of the capabilities of each.
You can suspend and restart programs if your login has been configured for job control. See your system
administrator to have your login set up to include job control. The jobs command also gives you a listing of
current background processes, running or stopped. However, in addition to the PID, the jobs command gives
you a number called the ``job identifier'' (JID) and the original command typed to initiate the job (job_name).
You need to know the JID of a process whenever you want to restart a stopped job or resume a background
process in foreground. The JID is printed on the screen when you enter a command to start or stop a process.
To obtain information about your stopped or background jobs, type:
jobs

The system will respond by displaying information such as the following:


Executing, stopping and restarting processes

16

Programming with the SCO OpenServer system shell


[JID] Stopped(signal) job_name
or
[JID] + Running job_name

Terminating active processes


The kill command terminates active shell processes in background mode and the stop command temporarily
suspends the process if job control is active. The general format for these commands is:
kill PID

or
stop %JID

Note that you cannot terminate background processes by pressing the <BREAK> or <DELETE> key. The
following example shows how you can terminate the grep command that you started executing in background
mode in the previous example.
$ kill 28223
[JID] + Terminatedjob_name
$

Notice that the system responds with a message and a ``$'' prompt, showing that the process has been killed. If
the system cannot find the PID number you specify, it responds with an error message:
kill:28223:No such process

To suspend a foreground process in the job shell (only when job control is active), type:
<CTRLz>

A message appears on the screen resembling the following:


[JID] Stopped(user) job_name
See the kill (C) manual page for all available options and an explanation of the capabilities of each.
Restarting a stopped process
When job control is active you can restart a suspended process. To restart a process with the stop command,
you must first determine the JID by using the jobs command. You can then use the JID with the following
commands:
fg %JID Resume a stopped or background job in foreground.
bg %JID Restart a stopped job in background.

Executing, stopping and restarting processes

17

Programming with the SCO OpenServer system shell


Using the nohup command
All processes, except the at and batch requests, are killed when you log out. If you want a background
process to continue running after you log out, you must use the nohup command to submit that background
command.
To execute the nohup command, use the following format:
nohup command &

Notice that you place the nohup command before the command you intend to run as a background process.
For example, suppose you want the grep command to search all the files in your current directory for the
string word and redirect the output to a file called word.list, and you want to log out immediately afterward.
Type the command line as follows:
nohup grep word * > word.list &

You can terminate the nohup command by using the kill command. Now that you have mastered these basic
shell commands and notations, use them in your shell programs! The exercises in the following section will
help you practice using the shell command language. Answers to the exercises appear at the end of the
section.

Command language exercises


11.
What happens if you use an * (asterisk) at the beginning of a filename? Try to list some of the files in
a directory using the * with the last letter of one of your filenames. What happens?
12.
Try to enter the following two commands:
cat [ 09 ] *

echo *

13.
Is it acceptable to use a ``?'' at the beginning or in the middle of a pattern? Try it.
14.
Do you have any files that begin with a number? Can you list them without listing the other files in
your directory? Can you list only those files that begin with a lowercase letter between a and m?
(Hint: Use a range of numbers or letters in [ ]).
15.
Is it acceptable to place a command in background mode on a line that is executing several other
commands sequentially? Try it. What happens? (Hint: Use ``;'' and ``&.'') Can the command in
background mode be placed in any position on the command line? Try placing it in various positions.
Experiment with each new character that you learn to see the full power of the character.
Executing, stopping and restarting processes

18

Programming with the SCO OpenServer system shell


16.
Redirect the output of pwd and ls into a file by using the following command line:
cd; pwd; trial; ls >> trial

Remember, if you want to redirect both commands to the same file, you have to use the ``>>''
(append) sign for the second redirection. If you do not, you will wipe out the information from the
pwd command.
17.
Instead of cutting the time out of the date response, try redirecting only the date, without the time,
into banner. What is the only part you need to change in the following command line?
banner `date | cut c1219`

Answers
11.
The * at the beginning of a filename refers to all files that end in that filename, including that
filename.
$ ls *t
cat
123t
new.t
t
$
12.
The command cat [09]* produces the following output:
1memo
100data
9
05name

The command echo * produces a list of all the files in the current directory.
13.
You can place ``?'' in any position in a filename.
14.
The command ls [09]* lists only those files that start with a number.
The command ls [am]* lists only those files that start with the letters ``a'' through ``m.''
15.
If you placed the sequential command line in the background mode, the immediate system response
was the PID number for the job.
Answers

19

Programming with the SCO OpenServer system shell


No, the & (ampersand) must be placed at the end of the command line.
16.
The command line would be:
cd; pwd > junk; ls >> junk

17.
Change the c option of the command line to read:
banner `date | cut c110`

Shell programming
You can use the shell to create programsnew commands. Such programs are also called shell procedures.
This section tells you how to create and execute shell programs using commands, variables, positional
parameters, return codes, and basic programming control structures.
The examples of shell programs in this section are shown two ways. First, the cat command is used in a
screen to display the contents of a file containing a shell program:
$ cat testfile
first_command
.
.
.
last_command
$

Second, the results of executing the shell program appear after a command line:
$ testfile
program_output
$

You should be familiar with an editor before you try to create shell programs.

Shell programs
We will begin by creating a simple shell program that will do the following tasks, in order:
print the current directory
list the contents of that directory
display this message on your terminal:
This is the end of the shell program.

Create a file called dl (short for directory list) using your editor of choice, and enter the following:
pwd
ls
echo This is the end of the shell program.

Shell programming

20

Programming with the SCO OpenServer system shell


Now write and quit the file. You have just created a shell program! You can cat the file to display its contents,
as the following screen shows:
$ cat dl
pwd
ls
echo This is the end of the shell program.
$

Executing a shell program


One way to execute a shell program is to use the sh command. Type:
sh dl

The dl command is executed by sh, and the pathname of the current directory is printed first, then the list of
files in the current directory, and finally, the comment This is the end of the shell program.
The sh command provides a good way to test your shell program to make sure it works.
If dl is a useful command, you can use the chmod command to make it an executable file; then you can type
dl by itself to execute the command it contains. The following example shows how to use the chmod
command to make a file executable and then run the ls l command to verify the changes you have made in
the permissions.
$ chmod u+x dl
$ ls l
total 2
rw
1
rwx
1
$

login
login

login
login

3661
48

Nov 2
Nov 15

10:28 mbox
10:50 dl

Notice that chmod turns on permission to execute (+x) for the user (u). Now dl is an executable program. Try
to execute it. Type:
dl

You get the same results as before, when you entered sh dl to execute it.
Creating a bin directory for executable files
To make your shell programs accessible from all your directories, you can make a bin directory from your
login directory and move the shell files to your bin.
You must also set your shell variable PATH to include your bin directory:
PATH=$PATH:$HOME/bin

See ``Variables'' and ``Using shell variables'' for more information about PATH.
The following example reminds you which commands are necessary. In this example, dl is in the login
directory. Type these command lines:
cd

Shell programs

21

Programming with the SCO OpenServer system shell


mkdir bin
mv dl bin/dl

Move to the bin directory and type the ls l command. Does dl still have execute permission?
Now move to a directory other than the login directory, and type the following command:
dl

What happened?
It is possible to give the bin directory another name; if you do so, you must change your shell variable PATH
again.
Warnings about naming shell programs
You can give your shell program any appropriate filename; however, you should not give your program the
same name as a system command. Depending on your path, the system may execute your command instead of
the system command. For example, if you had named your dl program mv, each time you tried to move a file,
the system might have executed your directory list program instead of mv.
Another problem can occur if you name the dl file ls, and then try to execute the file. You would create an
infinite loop, since your program executes the ls command. After some time, the system would give you the
following error message:
Too many processes, cannot fork

What happened? You typed in your new command, ls. The shell read and executed the pwd command. Then
it read the ls command in your program and tried to execute your ls command. This formed an infinite loop.
For this reason, the SCO OpenServer system limits the number of times an infinite loop can execute. One way
to prevent such looping is to give the pathname for the system ls command, /usr/bin/ls, when you write your
own shell program.
The following ls shell program would work:
$ cat ls
pwd
/bin/ls
echo This is the end of the shell program

If you name your command ls, then you can only execute the system ls command by using its full pathname,
/usr/bin/ls.

Variables
Variables are the basic data objects that, in addition to files, shell programs manipulate. Here we discuss three
types of variables and how to use them:
positional parameters
special parameters
named variables

Shell programs

22

Programming with the SCO OpenServer system shell


Positional parameters
A positional parameter is a variable within a shell program; its value is set from an argument specified on the
command line that invokes the program. Positional parameters are numbered and are referred to with a
preceding ``$'': $1, $2, $3, and so on.
A shell program may reference up to nine positional parameters. If a shell program is invoked with a
command line that appears like this:
shell.prog pp1 pp2 pp3 pp4 pp5 pp6 pp7 pp8 pp9

then positional parameter $1 within the program is assigned the value pp1, positional parameter $2 within the
program is assigned the value pp2, and so on, at the time the shell program is invoked.
To practice positional parameter substitution, create a file called pp (short for positional parameters).
(Remember, the directory in which these example files reside must be in $PATH.) Then enter the echo
commands shown in the following screen. Enter the command lines so that running the cat command on your
completed file will produce the following output:
$ cat
echo
echo
echo
echo
$

pp
The
The
The
The

first positional parameter is: $1


second positional parameter is: $2
third positional parameter is: $3
fourth positional parameter is: $4

If you execute this shell program with the arguments one, two, three, and four, you will obtain the following
results (but first you must make the shell program pp executable using the chmod command):
$ chmod u+x pp
$
$ pp one two three four
The first positional parameter is: one
The second positional parameter is: two
The third positional parameter is: three
The fourth positional parameter is: four
$

Another example of a shell program is bbday, which mails a greeting to the login entered in the command
line. The bbday program contains one line:
banner happy birthday | mail $1

Try sending yourself a birthday greeting. If your login name is sue, your command line will be:
bbday sue

The who command lists all users currently logged in on the system. How can you make a simple shell
program called whoson, that will tell you if the owner of a particular login is currently working on the
system?
Type the following command line into a file called whoson:
who | grep $1

Variables

23

Programming with the SCO OpenServer system shell


The who command lists all current system users, and grep searches that output for a line with the string
contained as a value in the positional parameter $1.
Now try using your login as the argument for the new program whoson. For example, suppose your login is
sue. When you issue the whoson command, the shell program substitutes sue for the parameter $1 in your
program and executes as if it were:
who | grep sue

The output appears on your screen as follows:


$ whoson sue
sue
tty26
$

Jan 24 13:35

If the owner of the specified login is not currently working on the system, grep fails and the whoson prints no
output.
The shell allows a command line to contain at least 128 arguments; however, a shell program is restricted to
referencing only nine positional parameters, $1 through $9, at a given time. You can work around this
restriction by using the shift command. See sh(C) for details. The special parameter ``$*'' (described in the
next section) can also be used to access the values of all command line arguments.
Special parameters

``$#''
This parameter, when referenced in a shell program, contains the number of arguments with which the
shell program was invoked. Its value can be used anywhere in the shell program.
Enter the command line, shown in the following screen, in the executable shell program called get.num. Then
run the cat command on the file:
$ cat get.num
echo The number of arguments is: $#
$

The program simply displays the number of arguments with which it is invoked. For example:
$ get.num test out this program
The number of arguments is: 4
$

You can write a simple shell program to demonstrate ``$*''. Create a shell program called show.param that
will echo all the parameters. Use the command line shown in the following completed file:
$ cat show.param
echo The parameters for this command are: $*
$

The program show.param will echo all the arguments you give the command. Make show.param executable
and try it using these parameters:
Variables

24

Programming with the SCO OpenServer system shell


Hello. How are you?
$ show.param Hello. How are you?
The parameters for this command are: Hello.
$

How are you?

Notice that show.param echoes Hello. How are you? Now try show.param using more than nine
arguments:
$ show.param a b c d e f g h i j
The parameters for this command are: a b c d e f g h i j
$

Once again, show.param echoes all the arguments you give. The ``$*'' parameter can be useful if you use
filename expansion to specify arguments to the shell command.
Use the filename expansion feature with your show.param command. For example, suppose you have three
files in your directory named for the first three chapters of a book. The show.param command prints a list of
all those files.
$ show.param chap?
The parameters for this command are: chap1 chap2 chap3
$

Named variables
Another form of variable that you can use in a shell program is a named variable. You assign values to named
variables yourself. The format for assigning a value to a named variable is:
named_variable=value

Notice that there are no spaces on either side of the equals (=) sign.
In the following example, var1 is a named variable, and myname is the value or character string assigned to
that variable:
var1=myname

A ``$'' is used in front of a variable name in a shell program to reference the value of that variable. Using the
example above, the reference $var1 tells the shell to substitute the value myname (assigned to var1), for any
occurrence of the character string $var1.
The first character of a variable name must be a letter or an underscore. The rest of the name can consist of
letters, underscores, and digits. Like shell program filenames, variable names should not be shell command
names. Also, the shell reserves some variable names that you should not use for your variables. The following
list provides brief descriptions of some of the most important of these reserved shell variable names.
CDPATH defines the search path for the cd command.
HOME is the default variable for the cd command (home directory).
IFS defines the internal field separators (normally <<Space>>, <TAB>, and <Return>).
LOGNAME is your login name.
MAIL names the file that contains your electronic mail.
PATH determines the search path used by the shell to find commands.
Variables

25

Programming with the SCO OpenServer system shell


PS1 defines the primary prompt (default is ``$'').
PS2 defines the secondary prompt (default is >).
TERM identifies your terminal type. It is important to set this variable if you are editing with vi.
TERMINFO identifies the directory to be searched for information about your terminal, for
example, its screen size.
TZ defines the time zone (default is EST5EDT).
Many of these variables are explained in ``Modifying your login environment''.
You can see the value of these variables in your shell in two ways. First, you can type
echo $variable_name

The system outputs the value of variable_name. Second, you can use the env(C) command to print out the
value of all defined variables in the shell. To do this, type env on a line by itself; the system outputs a list of
the variable names and values.
Assigning a value to a variable
You can set the TERM variable by entering the following command line:
TERM=terminal_name
export TERM

This is the simplest way to assign a value to a variable. However, there are several other ways to do this:
Use the read command to assign input to the variable.
Redirect the output of a command into a variable by using command substitution with grave accents (`
. . . `).
Assign a positional parameter to the variable.
The following sections discuss each of these methods in detail.
Using the read command

The read command used within a shell program allows you to prompt the user of the program for the values
of variables. The general format for the read command is:
read variable

The values assigned by read to variable will be substituted for ``$''variable wherever it is used in the
program. If a program executes the echo command just before the read command, the program can display
directions such as Type in . . . . The read command will wait until you type a character string, followed by
<Return>, and then make that string the value of the variable.
The following example shows how to write a simple shell program called num.please to keep track of your
telephone numbers. This program uses the following commands for the purposes specified.

echo
Prompt you for a person's last name.

Variables

26

Programming with the SCO OpenServer system shell


read
Assign the input value to the variable name.
grep
Search the file list for this variable.
Your finished program should look like the one displayed here:
$ cat num.please
echo Type in the last name:
read name
grep $name home/list
$

Create a file called list that contains several last names and telephone numbers. Then try running num.please.
The next example is a program called mknum, which creates a list. mknum includes the following
commands for the purposes shown.
echo prompts for a person's name.
read assigns the person's name to the variable name.
echo asks for the person's number.
read assigns the telephone number to the variable num.
echo adds the values of the variables name and num to the file list.
If you want the output of the echo command to be added to the end of list, you must use >> to redirect it. If
you use >, list will contain only the last telephone number you added.
Running the cat command on mknum displays the contents of the program. When your program looks like
this, you will be ready to make it executable (with the chmod command):
$ cat mknum
echo Type in name
read name
echo Type in number
read num
echo $name $num >> list
$ chmod u+x mknum
$

Try out the new programs for your telephone list. In the next example, mknum creates a new listing for Mr.
Niceguy. Then num.please gives you Mr. Niceguy's telephone number:
$ mknum
Type in name
Mr. Niceguy
Type in number
6680007
$ num.please
Type in last name
Niceguy
Mr. Niceguy 6680007
$

Variables

27

Programming with the SCO OpenServer system shell


Notice that the variable name accepts both Mr. and Niceguy as the value.

Substituting command output for the value of a variable

You can substitute the output of a command for the value of a variable by using command substitution in the
following format:
variable=`command`

The output from command becomes the value of variable.


In one of the previous examples on piping, the date command was piped into the cut command to get the
correct time. That command line was the following:
date | cut c1219

You can put this in a simple shell program called t that gives you the time.
$ cat t
time=`date | cut c1216`
echo The time is: $time
$

Remember, there are no spaces on either side of the equal sign. Make the file executable, and you will have a
program that gives you the time:
$ chmod u+x t
$ t
The time is: 10:36
$

Assigning values with positional parameters

You can assign a positional parameter to a named parameter by using the following format:
var1=$1

The next example is a simple program called simp.p that assigns a positional parameter to a variable. By
running the cat command on simp.p, you can see the contents of this program:
$ cat simp.p
var1=$1
echo $var1
$

Of course, you can also assign to a variable the output of a command that uses positional parameters, as
follows:
person=`who | grep $1`

In the next example, the program log.time keeps track of your whoson program results. The output of whoson
is assigned to the variable person, and added to the file login.file with the echo command. The last echo
Variables

28

Programming with the SCO OpenServer system shell


displays the value of $person, which is the same as the output from the whoson command:
$ cat log.time
person=`who | grep $1`
echo $person >> $home/login.file
echo $person
$

If you execute log.time specifying maryann as the argument, the system responds as follows:
$ log.time maryann
maryann
tty61
$

Apr 11 10:26

Shell programming constructs


The shell programming language has several constructs that give added flexibility to your programs:
Comments let you document the function of a program.
The here document allows you to include, within the shell program itself, lines to be redirected as
input to some command in the shell program.
The exit command lets you terminate a program at a point other than the end of the program and use
return codes.
The looping constructs, for and while, allow a program to iterate through groups of commands in a
loop.
The conditional control commands, if and case, execute a group of commands only if a particular set
of conditions is met.
The break command allows a program to exit unconditionally from a loop.
Comments
When you place comments in a shell program, the shell ignores all text on a line following a word that begins
with a ``#'' (pound) sign. If the ``#'' sign appears at the beginning of a line, the comment uses the entire line; if
it appears after a command, the command is executed but the remainder of the line is ignored. The end of a
line always ends a comment. The general format for a comment line is
#comment

For example, a program that contains the following lines will ignore them when it is executed:
# This program sends a generic birthday greeting.
# This program needs a login as
# the positional parameter.

Comments are useful for documenting the function of a program and should be included in any program you
write.
Here documents
A here document allows you to place into a shell program lines that are redirected to be the input to a
command in that program. By using a here document, you can provide input to a command in a shell program
without using a separate file. The notation consists of the redirection symbol ``<<'' and a delimiter that
Shell programming constructs

29

Programming with the SCO OpenServer system shell


specifies the beginning and end of the lines of input. The delimiter can be one character or a string of
characters; the ``!'' is often used.
``Format of a here document'' shows the general format for a here document.

command <<delimiter . . . input lines . . . delimiter


Format of a here document
In the next example, the program gbday uses a here document to send a generic birthday greeting by
redirecting lines of input into the mail command:
$ cat gbday
mail $1 <<!
Best wishes to you on your birthday.
!
$

When you use this command, you must specify the recipient's login as the argument to the command. The
input included with the use of the here document is:
Best wishes to you on your birthday.

For example, to send this greeting to the owner of login mary, type:
$ gbday mary

User mary will receive your greeting the next time she reads her mail messages:
$ mail
From mylogin Mon May 14 14:31 CDT 1991
Best wishes to you on your birthday.
$

Using ed in a shell program


The here document offers a convenient and useful way to use ed in a shell script. For example, suppose you
want to make a shell program that will enter the ed editor, make a global substitution to a file, write the file,
and then quit ed. The following screen shows the contents of a program called ch.text which does these tasks.
$ cat ch.text
echo Type in the filename.
read file1
echo Type in the exact text to be changed.
read old_text
echo Type in the exact new text to replace the above.
read new_text
ed $file1 <<!
g/$old_text/s//$new_text/g
w
q
!
$

Shell programming constructs

30

Programming with the SCO OpenServer system shell


Notice the (minus) option to the ed command. This option prevents the character count from being
displayed on the screen. Notice, also, the format of the ed command for global substitution:
g/old_text/s//new_text/g

The program uses three variables: file1, old_text, and new_text. When the program is run, it uses the read
command to obtain the values of these variables. The variables provide the following information:

file
the name of the file to be edited
old_text
the exact text to be changed
new_text
the new text
Once the variables are entered in the program, the here document redirects the global substitution, the write
command, and the quit command into the ed command. Try the new ch.text command. The following screen
shows sample responses to the program prompts:
$ ch.text
Type in the filename.
memo
Type in the exact text to be changed.
Dear John:
Type in the exact new text to replace the above.
To whom it may concern:
$ cat memo
To whom it may concern:
$

Notice that by running the cat command on the changed file, you could examine the results of the global
substitution.
The stream editor sed can also be used in shell programming.
Return codes
Most shell commands issue return codes that show whether the command executed properly. By convention,
if the value returned is 0 (zero), then the command executed properly; any other value shows that it did not.
The return code is not printed automatically, but is available as the value of the shell special parameter ``$?''.
Checking return codes

After executing a command interactively, you can see its return code by typing
echo $?

Consider the following example:


$ cat hi
This is file hi.

Shell programming constructs

31

Programming with the SCO OpenServer system shell


$ echo $?
0
$ cat hello
cat: cannot open hello
$ echo $?
2
$

In the first case, the file hi exists in your directory and has read permission for you. The cat command behaves
as expected and outputs the contents of the file. It exits with a return code of 0, which you can see using the
parameter $?. In the second case, the file either does not exist or does not have read permission for you. The
cat command prints a diagnostic message and exits with a return code of 2.
Using return codes with the exit command

A shell program normally terminates when the last command in the file is executed. However, you can use
the exit command to terminate a program at some other point. Perhaps more importantly, you can also use the
exit command to issue return codes for a shell program.
Loop constructs: for and while
In the previous examples in this section, the commands in shell programs have been executed in sequence.
The for and while looping constructs allow a program to execute a command or sequence of commands
several times.
The for loop

The for loop executes a sequence of commands once for each member of a list. It has the format shown in
``Format of the for loop construct''.
for variable
in a_list_of_values
do
command_1
command_2
.
.
.
last_command
done

Format of the for loop construct


For each iteration of the loop, the next member of the list is assigned to the variable given in the for clause.
References to that variable may be made anywhere in the commands within the do clause.
It is easier to read a shell program if the looping constructs are visually clear. Because the shell ignores spaces
at the beginning of lines, each section of a command can be indented as it was in the above format. Also, if
you indent each command section, you can easily check to make sure each do has a corresponding done at the
end of the loop.
The variable can be any name you choose. For example, if you call it var, then the values given in the list
after the keyword in will be assigned in turn to var; references within the command list to $var will make the
value available. If the in clause is omitted, the values for var will be the complete set of arguments given to
Shell programming constructs

32

Programming with the SCO OpenServer system shell


the command and available in the special parameter ``$*''. The command list between the keywords do and
done will be executed once for each value.
When the commands have been executed for the last value in the list, the program will execute the next line
below done. If there is no line, the program will end.
The easiest way to understand a shell programming construct is to try an example. Create a program that will
move files to another directory. Include the following commands for the purposes shown.

echo
Prompt the user for a pathname to the new directory.
read
Assign the pathname to the variable path.
for variable
Call the variable file; it can be referenced as $file in the command sequence.
in list_of_values
Supply a list of values. If the in clause is omitted, the list of values is assumed to be ``$*'' (all the
arguments entered on the command line).
do command_sequence
Provide a command sequence. The construct for this program will be:
do
mv $file $path/$file
done

The following screen shows the text for the shell program mv.file:
$ cat mv.file
echo Please type in the directory path
read path
for file
in memo1 memo2 memo3
do
mv $file $path/$file
done
$

In this program the values for the variable file are already in the program. To change the files each time the
program is invoked, assign the values using positional parameters or the read command. When positional
parameters are used, the in keyword is not needed, as the next screen shows:
$ cat mv.file
echo type in the directory path
read path
for file
do
mv $file $path/$file
done
$

Shell programming constructs

33

Programming with the SCO OpenServer system shell


You can move several files at once with this command by specifying a list of filenames as arguments to the
command. (This can be done most easily using the filename expansion mechanism described earlier).
The while loop

Another loop construct, the while loop, uses two groups of commands. It will continue executing the
sequence of commands in the second group, the do . . . done list, as long as the final command in the first
group, the while list, returns a status of (true), meaning the statements after the do can be executed.
The general format of the while loop is shown in ``Format of the while loop construct''.
while
command_1
.
.
.
last_command
do
command_1
.
.
.
last_command
done

Format of the while loop construct


For example, a program called enter.name uses a while loop to enter a list of names into a file. The program
consists of the following command lines:
$ cat enter.name
while
read x
do
echo $x>>xfile
done
$

With some added refinements, the program becomes:


$ cat enter.name
echo "Please type in each person's name and then a <Return>"
echo "Please end the list of names with a CTRLd"
while read x
do
echo $x>>xfile
done
echo xfile contains the following names:
cat xfile
$

Notice that, after the loop is completed, the program executes the commands below the done.
You used special characters in the first two echo command lines, so you must use quotes to turn off the
special meaning. The next screen shows the results of enter.name:
$

enter.name

Shell programming constructs

34

Programming with the SCO OpenServer system shell


Please type in each person's name and then a <Return>
Please end the list of names with a CTRLd

Mary Lou

Janice

<CTRLd>
xfile contains the following names:
Mary Lou
Janice
$

Notice that after the loop completes, the program prints all the names contained in xfile.
The shell's garbage can: /dev/null
The file system has a file called /dev/null where you can have the shell deposit any unwanted output.
Try /dev/null by ignoring the results of the who command. First, type in the who command. The response tells
you who is on the system. Now, try the who command, but redirect the output into /dev/null:
who > /dev/null

Notice the system responded with a prompt. The output from the who command was placed in /dev/null and
was effectively discarded.
Conditional constructs: if and case
Conditional constructs cause branches in the path of execution based on the outcome of a comparison.
if . . . then

The if command tells the shell program to execute the then sequence of commands only if the final command
in the if command list is successful. The if construct ends with the keyword fi.
The general format for the if construct is shown in ``Format of the if . . . then conditional construct''.
if
command_1
.
.
.
last_command
then
command_1
.
.
.
last_command
fi

Format of the if . . . then conditional construct


Shell programming constructs

35

Programming with the SCO OpenServer system shell


For example, a shell program called search demonstrates the use of the if . . . then construct. The search
program uses the grep command to search for a word in a file. If grep is successful, the program echos that
the word is found in the file. Copy the search program (shown on the following screen) and try it yourself:
$ cat search
echo Type in the word and the filename.
read word file
if grep $word $file
then echo $word is in $file
fi
$

Notice that the read command assigns values to two variables. The first characters you type, up to a space, are
assigned to word. The rest of the characters, including embedded spaces, are assigned to file.
A problem with this program is the unwanted display of output from the grep command. If you want to
dispose of the system response to the grep command in your program, use the file /dev/null, changing the if
command line to the following:
if grep $word $file > /dev/null

Now execute your search program. It should respond only with the message specified after the echo
command.
if . . . then . . . else

The if . . . then construction can also issue an alternate set of commands with else, when the if command
sequence is false. when the if command sequence is false. It has the general format shown in ``Format of the if
. . . then . . . else conditional construct''.
if
command_1
.
.
.
last_command
then
command_1
.
.
.
last_command
else
command_1
.
.
.
last_command
fi

Format of the if . . . then . . . else conditional construct


You can now improve your search command so it will tell you when it cannot find a word, as well as when it
can. The following screen shows how your improved program will look:
$ cat search

Shell programming constructs

36

Programming with the SCO OpenServer system shell


echo Type in the word and the filename.
read word file
if
grep $word $file >/dev/null
then
echo $word is in $file
else
echo $word is NOT in $file
fi
$

The test command for loops

The test command, which checks to see if certain conditions are true, is a useful command for conditional
constructs. If the condition is true, the loop will continue. If the condition is false, the loop will end and the
next command will be executed. Some of the useful options for the test command are:
test r file
test w file
test x file
test s file
test var1 eq var2
test var1 ne var2

true if the file exists and is readable


true if the file exists and has write permission
true if the file exists and is executable
true if the file exists and has at least one character
true if var1 equals var2
true if var1 does not equal var2

You may want to create a shell program to move all the executable files in the current directory to your bin
directory. You can use the test x command to select the executable files. Review the example of the for
construct that occurs in the mv.file program, shown in the following screen:
$ cat mv.file
echo type in the directory path
read path
for file
do
mv $file $path/$file
done
$

Create a program called mv.ex that includes an if test x statement in the do . . . done loop to move
executable files only. Your program will be as follows:
$ cat mv.ex
echo type in the directory path
read path
for file
do
if test x $file
then
mv $file $path/$file
fi
done
$

The directory path is the path from the current directory to the bin directory. However, if you use the value for
the shell variable HOME, you will not need to type in the path each time. $HOME gives the path to the login
Shell programming constructs

37

Programming with the SCO OpenServer system shell


directory. $HOME/bin gives the path to your bin.
In the following example, mv.ex does not prompt you to type in the directory name, and therefore, does not
read the path variable:
$ cat mv.ex
for file
do
if test x $file
then
mv $file $HOME/bin/$file
fi
done
$

Test the command, using all the files in the current directory, specified with the * special character as the
command argument. The command lines shown in the following example execute the command from the
current directory and then changes to bin and lists the files in that directory. All executable files should be
there.
$ mv.ex *
$ cd; cd bin; ls
list_of_executable_files
$

case . . . esac

The case . . . esac construction has a multiple choice format that allows you to choose one of several patterns
and then execute a list of commands for that pattern. The pattern statements must begin with the keyword in,
and a ``)'' must be placed after the last character of each pattern. The command sequence for each pattern is
ended with ``;;''. The case construction must be ended with esac (the letters of the word case reversed).
The general format for the case construction is shown in ``The case . . . esac conditional construct'':
case word
in
pattern_1)
command_line_1
. . .
last_command_line
;;
pattern_2)
command_line_1
. . .
last_command_line
;;
pattern_3)
command_line_1
. . .
last_command_line
;;
*)
command_1
. . .
last_command
;;
esac

Shell programming constructs

38

Programming with the SCO OpenServer system shell


The case . . . esac conditional construct
The case construction tries to match the word following the word case with the pattern in the first pattern
section. If a match exists, the program executes the command lines after the first pattern and up to the
corresponding ``;;''.
If the first pattern is not matched, the program proceeds to the second pattern. Once a pattern is matched, the
program does not try to match any more of the patterns, but goes to the command following esac.
The * used as a pattern matches any word, and so allows you to give a set of commands to be executed if no
other pattern matches. To do this, it must be placed as the last possible pattern in the case construct, so that
the other patterns are checked first. This helps you detect incorrect or unexpected input.
The patterns that can be specified in the pattern part of each section may use the special characters *, ``?'', and
``[]'' for filename expansion, as described earlier in this topic. This provides useful flexibility.
The set.term program contains a good example of the case . . . esac construction. This program sets the shell
variable TERM according to the type of terminal you are using. It uses the following command line:
TERM=terminal_name

In the following example, assume the terminal is a Teletype 4420, Teletype 5410, or Teletype 5420.
The set.term program first checks to see whether the value of term is 4420. If it is, the program makes T4 the
value of TERM, and terminates. If it the value of term is not 4420, the program checks for other possible
values: 5410 and 5420. It executes the commands under the first pattern it finds, and then goes to the first
command after the esac command.
The pattern *, meaning everything else, is included at the end of the terminal patterns. It warns that you do not
have a pattern for the terminal specified and it allows you to exit the case construct:
$ cat set.term
echo If you have a TTY 4420 type in 4420
echo If you have a TTY 5410 type in 5410
echo If you have a TTY 5420 type in 5420
read term
case $term
in
4420)
TERM=T4
;;
5410)
TERM=T5
;;
5420)
TERM=T7
;;
*)
echo not a correct terminal type
;;
esac
export TERM
echo end of program
$

Shell programming constructs

39

Programming with the SCO OpenServer system shell


Notice the use of the export command in the preceding screen. You use export to make a variable available
within your environment and to other shell procedures. What would happen if you placed the * pattern first?
The set.term program would never assign a value to TERM, since it would always match the first pattern *,
which means everything.

Unconditional control statements: break and continue


The break command unconditionally stops the execution of any loop in which it is encountered, and goes to
the next command after the done, fi, or esac statement. If no commands follow that statement, the program
ends.
In the example for set.term, you could have used the break command instead of echo to leave the program,
as the next example shows:
$ cat set.term
echo If you have a TTY 4420 type in 4420
echo If you have a TTY 5410 type in 5410
echo If you have a TTY 5420 type in 5420
read term
case $term
in
4420)
TERM=T4
;;
5410)
TERM=T5
;;
5420)
TERM=T7
;;
*)
break
;;
esac
export TERM
echo end of program
$

The continue command causes the program to go immediately to the next iteration of a while or for loop
without executing the remaining commands in the loop.

Functions
Functions provide a convenient and efficient means of coding and executing simple programs (more complex
programs should remain as shell programs). Functions are similar to shell programs except for the fact that
they are stored in memory and therefore, execute faster than a program. Another exception is that functions
only operate in the current shell process.
Defining a function
There are two formats that can be used in defining a function:
Format 1
Shell programming constructs

40

Programming with the SCO OpenServer system shell


name () { command; command; ... command; }

In this format, name is the name of the function. The parentheses is an indication to the shell that a function is
being defined. The body of the function (which is delimited by the curly braces) contains the commands to be
executed. Each command is separated by a semicolon and a space. The last command ends with a
semicolon, and the curly braces are separated from the body of the function by a space.
Format 2
name ()
> {
> command
> command
> command
> }

In this format, name () is the same as in format 1. However, upon pressing the <RETRUN> key, a ``>''
prompt will replace your regular shell prompt. The body of the function is coded at this point, starting with the
left curly brace. After the last command has been entered, the body of the function is closed with a right curly
brace. It is not necessary to use semicolons in this format.
Just as the exit statement is used within shell programs, the return statement is provided for use within
functions. This statement will terminate the function, but not the shell program that called the function. The
format of the return statement is:
return n

where n is the return status of the function. If n is omitted, or if a return statement is not coded within the
function, then the return status is that of the last command executed within the function.
Once the function has been defined, you can display it by using the shell set statement (without arguments)
which displays all of your current environment variable settings. At the end of the variable list, any functions
you have defined will be displayed.
If you find it necessary to remove a function during a session, the unset command can be used.
The format is:
unset function

where function is the name of the function to be removed.


Executing a function
To execute a function, enter the name of the function at your regular shell prompt. Any arguments listed after
the name of the function replace the positional parameters coded within the function (the same as in any other
shell program).
After a function has executed, you can display the return status by issuing the following:
echo $?

Functions

41

Programming with the SCO OpenServer system shell


Examples
The following defines a function that displays login information for a particular user (notice that format 1 is
used in this case):
whoon () { who | grep $1; }

The next example searches for a file in the current directory. Notice that format 2 is used in this case. Also,
the return statement is used. A return status of 1 indicates that the search did not find the file in question (a
message is also displayed to that effect). A return status of 0 indicates that the file exists.
isthere ()
{
if [ ! f $1 ]
then
echo "$1 was not created"
return 1
fi
return 0
}

Debugging programs
At times you may need to debug a program to find and correct errors. Two options to the sh command can
help you debug a program:

sh v shell_program_name
Print the shell input lines as they are read by the system.
sh x shell_program_name
Print commands and their arguments as they are executed.
To try these two options, create a shell program that has an error in it. For example, create a file called bug
that contains the following list of commands:
$ cat bug
today=`date`
echo enter person
read person
mail $1
$person
When you log off come into my office please.
$today.
MLH
$

Notice that today equals the output of the date command, which must be enclosed in grave accents for
command substitution to occur.
The mail message sent to Tom at login tommy ($1) should look like the following screen:
$ mail
From mlh
Tom

Functions

Wed Apr 10

11:36

CST

1991

42

Programming with the SCO OpenServer system shell


When you log off come into my office please.
Wed Apr 10 11:36:32 CST 1991
MLH
?
.

To execute bug, you have to press the <BREAK> or <DELETE> key to end the program.
To debug this program, try executing bug using sh v. This will print the lines of the file as they are read by
the system, as shown below:
$ sh v bug tommy
today=`date`
echo enter person
enter person
read person
tom
mail $1

Notice the output stops on the mail command, since there is a problem with mail. You must use the here
document to redirect input into mail.
Before you fix the bug program, try executing it with sh x, which prints the commands and their arguments
as they are read by the system.
$ sh x bug tommy
+date
today=Wed Apr 10 11:07:23
+ echo enter person
enter person
+ read person
tom
+ mail tommy
$

CST

1991

Once again, the program stops at the mail command. Notice that the substitutions for the variables have been
made and are displayed.
The corrected bug program is as follows:
$ cat bug
today=`date`
echo enter person
read person
mail $1 <<!
$person
When you log off come into my office please.
$today
MLH
!
$

The tee command is helpful for debugging pipelines. While simply passing its standard input to its standard
output, it also saves a copy of its input into the file whose name is given as an argument.
The general format of the tee command is:

Functions

43

Programming with the SCO OpenServer system shell


command_1 | tee saverfile | command_2

saverfile is the file in which the output of command_1 is saved for you to study.
For example, suppose you want to check on the output of the grep command in the following command line:
who | grep $1 | cut c19

You can use tee to copy the output of grep into a file called check, without disturbing the rest of the pipeline.
who | grep $1 | tee check | cut c19

The file check contains a copy of the grep output:


$ who | grep mlhmo | tee check | cut c19
mlhmo
$ cat check
mlhmo
tty61
Apr 10
11:30
$

Modifying your login environment


The SCO OpenServer system lets you modify your login environment in several ways. For example, users
frequently want to change the default values of the erase and line kill characters, <CTRLh> and ``@'',
respectively.
When you log in, the shell first examines a file in your login directory named .profile (pronounced ``dot
profile''). This file contains commands that control your shell environment.
Because the .profile is a shell script, it can be edited and changed to suit your needs. On some systems you can
edit this file yourself, whereas on others, the system administrator must do this for you. To see whether you
have a .profile in your home directory, type:
ls al $HOME

If you can edit the file yourself, you may want to be cautious the first few times. Before making any changes
to your .profile, make a copy of it in another file called safe.profile. Type:
cp .profile safe.profile

You can add commands to your .profile just as you add commands to any other shell program. You can also
set some terminal options with the stty command, and set some shell variables.

Adding commands to your .profile


Practice adding commands to your .profile. Edit the file and add the following echo command to the last line
of the file:
echo Good Morning! I am ready to work for you.

Write and quit the editor.


Modifying your login environment

44

Programming with the SCO OpenServer system shell


Whenever you make changes to your .profile and you want to initiate them in the current work session, you
may cause the commands in .profile to be executed directly, using the ``.'' (dot) shell command. The shell
reinitializes your environment by executing the commands in your .profile. Try this now. Type:
. .profile

The system should respond with the following:


Good Morning! I am ready to work for you.
$

Setting terminal options


The stty command can make your shell environment more convenient. You can use these options with stty:
tabs and echoe.

stty tabs
This option preserves tabs when you are printing. It expands the tab setting to eight spaces, which is
the default. The number of spaces for each tab can be changed. (See stty(C) for details.)
stty echoe
If you have a terminal with a screen, this option erases characters from the screen as you erase them
with the <BACKSPACE> key.
If you want to use these options for the stty command, you can create those command lines in your .profile
just as you would create them in a shell program. If you use the tail command, which displays the last few
lines of a file, you can see the results of adding those three command lines to your .profile:
$ tail 3 .profile
echo Good Morning! I am ready to work for you
stty tabs
stty echoe
$

Using shell variables


Several of the variables reserved by the shell are used in your .profile. You can display the current value for
any shell variable by entering the following command:
echo $variable_name

Four of the most basic of these variables are discussed next.

HOME
This variable gives the pathname of your login directory. Use the cd command to go to your login
directory and type:
pwd

Setting terminal options

45

Programming with the SCO OpenServer system shell


What was the system response? Now type:
echo $HOME

Was the system response the same as the response to pwd?


$HOME is the default argument for the cd command. If you do not specify a directory, cd will move
you to $HOME.
LANG
For many commands, this variable gives the language (such as French, German, and so on) in which
messages from the system are displayed on your screen. It also specifies the language and cultural
conventions the commands will use to process and sort characters, display the date and time, and
interpret numeric and monetary values. The default language is English. If you prefer to work in
another language, and if your system supports nonEnglish usage, you can specify the desired
language with this variable by assigning an appropriate value to it. For example, for German usage,
you might enter
LANG=de1[utsche]

Ask your system administrator which languages are available on your computer, and what values you
must assign to LANG to access them. Not all system commands support nonEnglish usage. Check
intro(C) for the ones that do. For details of LANG usage, see environ(5).
PATH
This variable gives the search path for finding and executing commands. To see the current values for
your PATH variable type:
echo $PATH

The system will respond with your current PATH value.


$ echo $PATH
:/mylogin/bin:/bin:/usr/bin
$

The colon ( ``:'' ) is a delimiter between pathnames in the string assigned to the $PATH variable.
When nothing is specified before a ``:'', the current directory is understood. Notice how, in the last
example, the system looks for commands in the current directory first, then in /mylogin/bin, then in
/bin, and finally in /usr/bin.
If you are working on a project with several other people, you may want to set up a group bin, a
directory of special shell programs used only by your project members. The path might be named
/project1/bin. Edit your .profile, and add :/project1/bin to the end of your PATH, as in the next
example.
PATH="$PATH:/project1/bin"

TERM
This variable tells the shell what kind of terminal you are using. To assign a value to it, you must
execute the following three commands in this order:
Setting terminal options

46

Programming with the SCO OpenServer system shell


TERM=terminal_name
export TERM
tput init

The first two lines, together, are necessary to tell the computer what type of terminal you are using.
The last line, containing the tput command, tells the terminal that the computer is expecting to
communicate with the type of terminal specified in the TERM variable. Therefore, this command
must always be entered after the variable has been exported.
If you do not want to specify the TERM variable each time you log in, add these three command lines
to your .profile; they will be executed automatically whenever you log in.
If you log in on more than one type of terminal, it would also be useful to have your set.term
command in your .profile.
PS1
This variable sets the primary shell prompt string (the default is the ``$'' sign). You can change your
prompt by changing the PS1 variable in your .profile.
Try the following example. Note that to use a multiword prompt, you must enclose the phrase in
quotes. Type the following variable assignment in your .profile.
PS1="Your command is my wish"

Now execute your .profile (with the . command) and watch for your new prompt sign.
$ . .profile
Your command is my wish

The ``$'' sign is gone forever, or at least until you delete the PS1 variable from your .profile.

Shell programming exercises


21.
Create a shell program called time from the following command line:
banner `date | cut c1219`

22.
Write a shell program that gives only the date in a banner display. Be careful not to give your program
the same name as a SCO OpenServer system command.
23.
Write a shell program that sends a note to several people on your system.
24.
Redirect the date command without the time into a file.
25.
Shell programming exercises

47

Programming with the SCO OpenServer system shell


Echo the phrase ``Dear colleague'' in the same file as the previous exercise, without erasing the date.
26.
Using the above exercises, write a shell program that sends a memo to the same people on your
system mentioned in Exercise 23. Include in your memo:
lines at the top that include the current date and the words ``Dear colleague''
the body of the memo (stored in an existing file)
a closing statement
27.
How can you read variables into the mv.file program?
28.
Use a for loop to move a list of files in the current directory to another directory. How can you move
all your files to another directory?
29.
How can you change the program search, so that it searches through several files?
Hint:
for file
in $*
210.
Set the stty options for your environment.
211.
Change your prompt to the word Hello.
212.
Check the settings of the variables $HOME, $TERM, and $PATH in your environment.

Answers
21.

$ cat time
banner `date | cut c1219`
$
$ chmod u+x time
22.
$ cat mydate
banner `date | cut c110`
$

23.
Answers

48

Programming with the SCO OpenServer system shell


$ cat tofriends
echo Type in the name of the file containing the note.
read note
mail janice marylou bryan < $note
$

Or, if you used parameters for the logins (instead of the logins themselves) your program may have
looked like this:
$ cat tofriends
echo Type in the name of the file containing the note.
read note
mail $* < $note
$

24.

date | cut c110 > file1


25.

echo Dear colleague >> file1


26.

$ cat send.memo
date | cut c110 > memo1
echo Dear colleague >> memo1
cat memo >> memo1
echo A memo from M. L. Kelly >> memo1
mail janice marylou bryan < memo1
$
27.

$ cat mv.file
echo type in the directory path
read path
echo type in filenames, end with CTRLd
while
read file
do
mv $file $path/$file
done
echo all done
$
28.

Answers

49

Programming with the SCO OpenServer system shell

$ cat mv.file
echo Please type in directory path
read path
for file in $*
do
mv $file $path/$file
done
$
The command line for moving all files in the current directory is:
$ mv.file *

29.
See the hint provided with exercise 29.
$ cat search
for file
in $*
do
if grep $word $file >/dev/null
then echo $word is in $file
else echo $word is NOT in $file
fi
done
$
210.
Add the following lines to your .profile:
stty tabs
stty erase
stty echoe

211.
Add the following command lines to your .profile:
PS1=Hello
export PS1

212.
Enter the following commands to check the values of the HOME, TERM, and PATH variables in
your home environment:
$ echo $HOME
$ echo $TERM
$ echo $PATH

Answers

50

Programming with the SCO OpenServer system shell

Summary of shell command language


This appendix is a summary of the shell command language and programming constructs. The first section
reviews metacharacters, special characters, input and output redirection, variables, and process control. The
second section contains models of the shell programming constructs.

The vocabulary of shell command language


The following sections list metacharacters, special characters, input and output redirection, variables, and
process control.
Special characters in the shell

*?[]
Metacharacters; used to provide a shortcut to referencing filenames, through pattern matching.
&
Executes commands in the background mode.
;
Sequentially executes several commands typed on one line, each pair separated by ;.
\
Turns off the meaning of the immediately following special character.
...
Enclosing single quotes turn off the special meaning of all characters except single quotes.
...
Enclosing double quotes turn off the special meaning of all characters except $, single quotes, and
double quotes.

Redirecting input and output

<
Redirects the contents of a file into a command.
>
Redirects the output of a command into a new file, or replaces the contents of an existing file with the
output.
>>
Redirects the output of a command so that it is appended to the end of a file.
|
Directs the output of one command so that it becomes the input of the next command.

Summary of shell command language

51

Programming with the SCO OpenServer system shell


`command`
Substitutes the output of the enclosed command in place of `command`.

Executing and terminating processes

batch
Submits the following commands to be processed at a time when the system load is at an acceptable
level. <CTRLd> ends the batch command.
at
Submits the following commands to be executed at a specified time. <CTRLd> ends the at
command.
at l
Reports which jobs are currently in the at or batch queue.
at r
Removes the at or batch job from the queue.
ps
Reports the status of the shell processes.
kill PID
Terminates the shell process with the specified process ID (PID).
nohup command list &
Continues background processes after logging out.

Making a file accessible to the shell

chmod u+x filename


Gives the user permission to execute the file (useful for shell program files).
mv filename $HOME/bin/filename
Moves your file to the bin directory in your home directory. This bin holds executable shell programs
that you want to be accessible. Make sure the PATH variable in your .profile file specifies this bin. If
it does, the shell will search in $HOME/bin for your file when you try to execute it. If your PATH
variable does not include your bin, the shell will not know where to find your file and your attempt to
execute it will fail.
filename
The name of a file that contains a shell program becomes the command that you type to run that shell
program.
The vocabulary of shell command language

52

Programming with the SCO OpenServer system shell


Variables

positional parameter
A numbered variable used within a shell program to reference values automatically assigned by the
shell from the arguments of the command line invoking the shell program.
echo
A command used to print the value of a variable on your terminal.

``$#''
A special parameter that contains the number of arguments with which the shell program has been
executed.
``$*''
A special parameter that contains the values of all arguments with which the shell program has been
executed.

named variable
A variable to which the user can give a name and assign values.

Variables used in the system

HOME
Denotes your home directory; the default variable for the cd command.
PATH
Defines the path your login shell follows to find commands.
MAIL
Gives the name of the file containing your electronic mail.
PS1, PS2
Defines the primary and secondary prompt strings, respectively.
TERM
Defines the type of terminal.
LOGNAME
Login name of the user.
IFS
Defines the internal field separators (normally the space, the tab, and the carriage return).
TERMINFO
The vocabulary of shell command language

53

Programming with the SCO OpenServer system shell


Allows you to request that the curses and terminfo subroutines search a specified directory tree before
searching the default directory for your terminal type.
TZ
Sets and maintains the local time zone.

Shell programming constructs


The following sections contain models of the shell programming constructs.
Here document
command <<!
input lines
!

For loop
for variable
in
this list of values
do
command 1
command 2
.
.
.
last command
done

While loop
while command list
do
command 1
command 2
.
.
.
last command
done

If...then
if
this command list is successful
then
command 1
command 2
.
.
.
last command
fi

Shell programming constructs

54

Programming with the SCO OpenServer system shell


If...then...else
if
this command list is successful
then
command 1
command 2
.
.
.
last command
else
command 1
command 2
.
.
.
last command
fi

Case construction
case word in
pattern1)
command 1
. . .
last command
;;
pattern2)
command 1
. . .
last command
;;
.
.
.
last pattern2)
command 1
. . .
last command
;;
esac

Break and continue statements


A break or continue statement forces the program to leave any loop and execute the command following the
end of the loop.

Shell programming constructs

55

Programming with awk


This topic describes a programming language that enables you to handle easily the tasks associated with data
processing and information retrieval. With awk, you can tabulate survey results stored in a file, print various
reports summarizing these results, generate form letters, count the occurrences of a string in a file, or reformat
a data file used for one application package so it can be used for another application package.
The name awk is an acronym formed from the initials of its developers. The name awk denotes both the
language and the SCO OpenServer system command you use to run an awk program.
awk is an easy language to learn. It automatically does many things that in other languages you have to
program yourself. As a result, many useful awk programs are only one or two lines long. Because awk
programs are usually smaller than equivalent programs in other languages, and because they are interpreted,
not compiled, awk is also a good language for prototyping.
The first part of this topic introduces you to the basics of awk and is intended to make it easy for you to start
writing and running your own awk programs. The rest of the topic describes the complete language and is
somewhat less tutorial. If you are an experienced awk user, you will find the skeletal summary of the
language at the end of the topic particularly useful.
You should be familiar with the SCO OpenServer system and shell programming to use awk. Although you
do not need other programming experience, some knowledge of the C programming language is beneficial
because many constructs found in awk are also found in C.

Basic awk
This section provides enough information for you to write and run some of your own programs. Each topic
presented in this section is discussed in more detail in later sections.

Program structure
The basic operation of awk is to scan a set of input lines one after another, searching for lines that match any
of a set of patterns or conditions you specify. For each pattern, you can specify an action; this action is
performed on each line that matches the pattern. Accordingly, an awk program is a sequence of
patternaction statements, as ``awk program structure and example'' shows.
Structure:

pattern { action } pattern { action } . . .


Example:
$1 == "address" { print $2, $3 }
awk program structure and example
The example in the figure is a typical awk program, consisting of one patternaction statement. The program
prints the second and third fields of each input line whose first field is address. In general, awk programs
Programming with awk

56

Programming with awk


work by matching each line of input against each of the patterns in turn. For each pattern that matches, the
associated action (which may involve multiple steps) is executed. Then the next line is read and the matching
starts again. This process typically continues until all the input has been read.
Either the pattern or the action in a patternaction statement may be omitted. If there is no action with a
pattern, as in
$1 == "name"

the matching line is printed. If there is no pattern with an action, as in


{ print $1, $2 }

the action is performed for every input line. Since patterns and actions are both optional, actions are enclosed
in braces to distinguish them from patterns.

Usage
You can run an awk program two ways. First, you can enter the command
$ awk 'patternaction statements' optional list of input files<<Return>>

to execute the patternaction statements on the set of named input files. For example, you could say
$ awk '{ print $1, $2 }' file1 file2<<Return>>

Notice that the patternaction statements are enclosed in single quotes. This protects characters like $ from
being interpreted by the shell and also allows the program to be longer than one line.
If no files are mentioned on the command line, awk reads from the standard input. You can also specify that
input comes from the standard input by using the hyphen () as one of the input files. For example,
$ awk '{ print $3, $4 }' file1 <<Return>>

says to read input first from file1 and then from the standard input.
The arrangement above is convenient when the awk program is short (a few lines). If the program is long, it is
often more convenient to put it into a separate file and use the f option to fetch it:
$ awk f program_file optional list of input files<<Return>>

For example, the following command line says to fetch and execute myprogram on input from the file file1:
$ awk f myprogram file1<<Return>>

Fields
Normally, awk reads its input one line, or record, at a time; a record is, by default, a sequence of characters
ending with a newline. Then awk splits each record into fields, where, by default, a field is a string of
nonblank, nontab characters.

Usage

57

Programming with awk


As input for many of the awk programs in this topic's sections, we use a file called countries, which contains
information about the ten largest countries in the world. (See ``The sample input file countries''.)
Each record contains the name of a country, its area in thousands of square miles, its population in millions,
and the continent on which it is located. (Data are from 1978; the U.S.S.R. has been arbitrarily placed in
Asia.) The white space between fields is a tab in the original input; a single blank separates North and South
from America .
The sample input file countries
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria

8650
3852
3692
3615
3286
2968
1269
1072
968
920

262
24
866
219
116
14
637
26
19
18

Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa

This file is typical of the kind of data awk is good at processinga mixture of words and numbers separated
into fields by blanks and tabs.
The number of fields in a record is determined by the field separator. Fields are normally separated by
sequences of blanks and/or tabs, so that the first record of countries would have four fields, the second five,
and so on. It is possible to set the field separator to just tab, so each line would have four fields, matching the
meaning of the data; we will show how to do this shortly. For the time being, we will use the default: fields
separated by blanks and/or tabs. The first field within a line is called $1, the second $2, and so forth. The
entire record is called $0.

Printing
If the pattern in a patternaction statement is omitted, the action is executed for all input lines. The simplest
action is to print each line; you can accomplish this with an awk program consisting of a single print
statement
{ print }

so the command line


awk '{ print }' countries

prints each line of countries, copying the file to the standard output. The print statement can also be used to
print parts of a record; for instance, the program
{ print $1, $3 }

prints the first and third fields of each record. Thus


Printing

58

Programming with awk


awk '{ print $1, $3 }' countries

produces as output the following sequence of lines:


USSR 262
Canada 24
China 866
USA 219
Brazil 116
Australia 14
India 637
Argentina 26
Sudan 19
Algeria 18

When printed, items separated by a comma in the print statement are separated by the output field separator
which, by default, is a single blank. Each line printed is terminated by the output record separator which, by
default, is a newline.
NOTE: In the remainder of this topic, we only show awk programs, without the command line that invokes
them. Each complete program can be run either by enclosing it in quotes as the first argument of the awk
command, or by putting it in a file and invoking awk with the f flag, as discussed in ``Usage''. For example,
if no input is mentioned, the input is assumed to be the file countries.

Formatted printing
For more carefully formatted output, awk provides a Clike printf statement
printf format, expr[1], expr[2], . . ., expr[n]

which prints the expr[i]'s according to the specification in the string format. For example, the awk program
{ printf "%10s %6d\n", $1, $3 }

prints the first field (``$1'') as a string of 10 characters (rightjustified), then a space, then the third field
(``$3'') as a decimal number in a sixcharacter field, then a newline (\n). With input from the file countries,
this program prints an aligned table:
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria

262
24
866
219
116
14
637
26
19
18

With printf, no output separators or newlines are produced automatically; you must create them yourself by
using \n in the format specification. ``The printf statement'' contains a full description of printf.

Formatted printing

59

Programming with awk

Simple patterns
You can select specific records for printing or other processing by using simple patterns. awk has three kinds
of patterns. First, you can use patterns called relational expressions that make comparisons. For example, the
operator == tests for equality. To print the lines for which the fourth field equals the string Asia, we can use
the program consisting of the single pattern
$4 == "Asia"

With the file countries as input, this program yields


USSR 8650
China
India

262
3692
1269

Asia
866
637

Asia
Asia

The complete set of comparisons is >, >=, <, <=, == (equal to) and != (not equal to). These comparisons can
be used to test both numbers and strings. For example, suppose we want to print only countries with a
population greater than 100 million. The program
$3 > 100

is all that is needed. It prints all lines in which the third field exceeds 100. (Remember that the third field in
the file countries is the population in millions.)
Second, you can use patterns called extended regular expressions that search for specified characters to select
records. The simplest form of an extended regular expression is a string of characters enclosed in slashes:
/US/

This program prints each line that contains the (adjacent) letters US anywhere; with the file countries as input,
it prints
USSR 8650
USA 3615

262
219

Asia
North America

We will have a lot more to say about extended regular expressions later in this topic.
Third, you can use two special patterns, BEGIN and END, that match before the first record has been read
and after the last record has been processed. This program uses BEGIN to print a title:
BEGIN
/Asia/

{ print "Countries of Asia:" }


{ print "
", $1 }

The output is
Countries of Asia:
USSR
China
India

Simple patterns

60

Programming with awk

Simple actions
We have already seen the simplest action of an awk program: printing each input line. Now let us consider
how you can use builtin and userdefined variables and functions for other simple actions in a program.
Builtin variables
Besides reading the input and splitting it into fields, awk counts the number of records read and the number of
fields within the current record; you can use these counts in your awk programs. The variable NR is the
number of the current record, and NF is the number of fields in the record. So the program
{ print NR, NF }

prints the number of each line and how many fields it has, while
{ print NR, $0 }

prints each record preceded by its record number.


Userdefined variables
Besides providing builtin variables like NF and NR, awk lets you define your own variables, which you can
use for storing data, doing arithmetic, and the like. To illustrate, consider computing the total population and
the average population represented by the data in the file countries:
{ sum = sum + $3 }
END { print "Total population is", sum, "million"
print "Average population of", NR, "countries is",
sum/NR }

NOTE: awk initializes sum to zero before using it.

The first action accumulates the population from the third field; the second action, which is executed after the
last input, prints the sum and average:
Total population is 2201 million
Average population of 10 countries is 220.1

Functions
Builtin functions of awk handle common arithmetic and string operations for you. For example, one of the
arithmetic functions computes square roots; a string function substitutes one string for another. awk also lets
you define your own functions. Functions are described in detail in ``Actions''.

A handful of useful oneliners


Although awk can be used to write large programs of some complexity, many programs are not much more
complicated than what we've seen so far. Here is a collection of other short programs that you may find useful
and instructive. Although these programs are not explained here, new constructs they may contain are
Simple actions

61

Programming with awk


discussed later in this topic.
Print last field of each input line:
{ print $NF }

Print 10th input line:


NR == 10

Print last input line:


END

{ line = $0}
{ print line }

Print input lines that do not have four fields:


NF != 4

{ print $0, "does not have 4 fields" }

Print input lines with more than four fields:


NF > 4

Print input lines with last field more than 4:


$NF > 4

Print total number of input lines:


END

{ print NR }

Print total number of fields:


END

{ nf = nf + NF }
{ print nf }

Print total number of input characters:


END

{ nc = nc + length($0) }
{ print nc + NR }

(Adding NR includes in the total the number of newlines.)


Print the total number of lines that contain the string Asia:
/Asia/
{ nlines++ }
END { print nlines }

(nlines++ has the same effect as nlines = nlines + 1.)

Simple actions

62

Programming with awk

Error messages
If you make an error in your awk program, you generally get an error message. For example, trying to run the
program
$3 < 200 { print ( $1 }

generates the error messages


awk: syntax error at source line 1
context is
$3 < 200 { print ( >>> $1 } <<<
awk: illegal statement at source line 1
1 extra (
Some errors may be detected while your program is running. For example, if you try to divide a number by
zero, awk stops processing and reports the input record number (NR) and the line number in the program.

Patterns
In a patternaction statement, the pattern is an expression that selects the records for which the associated
action is executed. This section describes the kinds of expressions that may be used as patterns.

BEGIN and END


BEGIN and END are two special patterns that give you a way to control initialization and wrapup in an awk
program. BEGIN matches before the first input record is read, so any statements in the action part of a
BEGIN are done once, before the awk command starts to read its first input record. The pattern END matches
the end of the input, after the last record has been processed.
The following awk program uses BEGIN to set the field separator to tab (\t) and to put column headings on
the output. The field separator is stored in a builtin variable called FS. Although FS can be reset at any time,
usually the only sensible place is in a BEGIN section, before any input has been read. The second printf
statement of the program which is executed for each input line, formats the output into a table, neatly aligned
under the column headings. The END action prints the totals. (Notice that a long line can be continued after a
comma.)
BEGIN { FS = "\t"
printf "%10s %6s %5s
%s\n",
"COUNTRY", "AREA", "POP", "CONTINENT" }
{ printf "%10s %6d %5d
%s\n", $1, $2, $3, $4
area = area + $2; pop = pop + $3 }
END
{ printf "\n%10s %6d %5d\n", "TOTAL", area, pop }

With the file countries as input, this program produces


COUNTRY
USSR
Canada
China
USA
Brazil

Error messages

AREA
8650
3852
3692
3615
3286

POP
262
24
866
219
116

CONTINENT
Asia
North America
Asia
North America
South America

63

Programming with awk


Australia
India
Argentina
Sudan
Algeria

2968
1269
1072
968
920

14
637
26
19
18

Australia
Asia
South America
Africa
Africa

TOTAL 30292 2201

Relational expressions
An awk pattern can be any expression involving comparisons between strings of characters or numbers. awk
has six relational operators, and two extended regular expression matching operators, ~ (tilde) and !~, which
are discussed in the next section, for making comparisons. ``awk comparison operators'' lists these operators
and their meanings.
awk comparison operators
Operator
<
<=
==
!=
>=
>
~
!~

Meaning
less than
less than or equal to
equal to
not equal to
greater than or equal to
greater than
matches
does not match

In a comparison, if both operands are numeric, a numeric comparison is made; otherwise, the operands are
compared as strings. (Every value might be either a number or a string; usually awk can tell what is intended.
``Number or string?'' contains more information about this.) Thus, the pattern $3>100 selects lines where the
third field exceeds 100, and the program
$1 >= "S"

selects lines that begin with the letters S through Z, namely,


USSR 8650
USA 3615
Sudan

262
219
968

Asia
North America
19
Africa

In the absence of any other information, awk treats fields as strings, so the program
$1 == $4

compares the first and fourth fields as strings of characters, and with the file countries as input, prints the
single line for which this test succeeds:
Australia

2968

14

Australia

If both fields appear to be numbers, the comparisons are done numerically.


Relational expressions

64

Programming with awk

Extended regular expressions


awk provides more powerful patterns for searching for strings of characters than the comparisons illustrated
in the previous section. These patterns are called regular expressions, and are like those in grep(C) and
lex(CP). The simplest extended regular expression is a string of characters enclosed in slashes, like
/Asia/

This program prints all input records that contain the substring Asia. (If a record contains Asia as part of a
larger string like Asian or PanAsiatic, it is also printed.) In general, if re is an extended regular expression,
then the pattern
/re/

matches any line that contains a substring specified by the extended regular expression re.
To restrict a match to a specific field, you use the matching operators ~ (matches) and !~ (does not match).
The program
$4 ~ /Asia/ { print $1 }

prints the first field of all lines in which the fourth field matches Asia, while the program
$4 !~ /Asia/ { print $1 }

prints the first field of all lines in which the fourth field does not match Asia.
In extended regular expressions, the symbols
\ ^ $ . []

+ ? () | {}

are metacharacters with special meanings like the metacharacters in the SCO OpenServer shell. For example,
the metacharacters ^ and $ match the beginning and end, respectively, of a string, and the metacharacter .
(dot) matches any single character. Thus,
/^.$/

matches all records that contain exactly one character.


A group of characters enclosed in square brackets matches any one of the enclosed characters; for example,
/[ABC]/ matches records containing any one of A, B, or C anywhere. Ranges of letters or digits can be
abbreviated within square brackets: /[azAZ]/ matches any single letter in the default locale.
If the first character after the [ is a ^, this complements the class so it matches any character not in the set:
/[^azAZ]/ matches any nonletter. The character + means ``one or more.'' Thus, the program
$2 !~ /^[09]+$/

prints all records in which the second field is not a string of one or more digits (^ for beginning of string,
[09]+ for one or more digits, and $ for end of string). Programs of this type are often used for data
validation.

Extended regular expressions

65

Programming with awk


awk also accepts the newer square bracket constructs. These constructs permit programs to be sensitive to the
current locale. For examle, instead of using [^azAZ] to mean nonletter and [09] to mean ``digit'' as
above, you can use [^[:alpha:]] and [[:digit:]], which are more descriptive and more portable. See grep(C)
for more details.
Parentheses () are used for grouping and the character | is used for alternatives. The program
/(apple|cherry) (pie|tart)/

matches lines containing any one of the four substrings apple pie,
apple tart, cherry pie, or cherry tart.
Extended regular expressions provide a more general form of repetition via the ``interval'' operator. This
operator is of the form {low,high}, with the high limit optional. The three operators ?, * and + are equivalent,
respectively, to the interval constructs [0,1}, {0,} and {1}. To denote an exact number of matches, use the form
{count}.
To turn off the special meaning of a metacharacter, precede it by a \ (backslash). Thus, the program
/b\$/

prints all lines containing b followed by a dollar sign.


In addition to recognizing metacharacters, awk recognizes the following C programming language escape
sequences within regular expressions and strings:
\b
\f
\n
\r
\t
\ddd
\"
\c
\xhhh

backspace
formfeed
newline
carriage return
tab
octal value ddd
quotation mark
any other character c literally
hexadecimal value hhh

For example, to print all lines containing a tab, use the program
/\t/

awk interprets any string or variable on the right side of a ~ or !~ as an extended regular expression. For
example, we could have written the program
$2 !~ /^[09]+$/

as
BEGIN
{ digits = "^[09]+$" }
$2 !~ digits

Extended regular expressions

66

Programming with awk


Suppose you want to search for a string of characters like ^[09]+$. When a literal quoted string like
"^[09]+$" is used as an extended regular expression, one extra level of backslashes is needed to protect
metacharacters. This is because one level of backslashes is removed when a string is originally parsed. If a
backslash is needed in front of a character to turn off its special meaning in an extended regular expression,
then that backslash needs a preceding backslash to protect it in a string.
For example, suppose you want to match strings containing b followed by a dollar sign. The extended regular
expression for this pattern is b\$. If you want to create a string to represent this extended regular expression,
you must add one more backslash: "b\\$". The two extended regular expressions on each of the following
lines are equivalent:
x
x
x
x

~
~
~
~

"b\\$"
"b\$"
"b$"
"\\t"

x
x
x
x

~
~
~
~

/b\$/
/b$/
/b$/
/\t/

The precise form of extended regular expressions and the substrings they match is in ``awk extended regular
expressions''. The unary operators , +, ? and intervals have the highest precedence, then concatenation, and
then alternation |. All operators are left associative. r stands for any extended regular expression.
awk extended regular expressions
Expression
c
\c
^
$
.
[s]
[^s]
r
r+
r?
r{low,high}
(r)
r[1] r[2]
r[1]|r[2]

Matches
any nonmetacharacter c
character c literally
beginning of string
end of string
any character but newline
any character in set s
any character not in set s
zero or more r's
one or more r's
zero or one r
at least low rs but not more than high
r
r[1] then r[2] (concatenation)
r[1] or r[2] (alternation)

Combinations of patterns
A compound pattern combines simpler patterns with parentheses and the logical operators || (or), && (and),
and ! (not). For example, suppose you want to print all countries in Asia with a population of more than 500
million. The following program does this by selecting all lines in which the fourth field is Asia and the third
field exceeds 500:
$4 == "Asia" && $3 > 500

Combinations of patterns

67

Programming with awk


The program
$4 == "Asia" || $4 == "Africa"

selects lines with Asia or Africa as the fourth field. Another way to write the latter query is to use an extended
regular expression with the alternation operator | :
$4 ~ /^(Asia|Africa)$/

The negation operator ! has the highest precedence, then &&, and finally ||. The operators && and || evaluate
their operands from left to right; evaluation stops as soon as truth or falsehood is determined.

Pattern ranges
A pattern range consists of two patterns separated by a comma, as in
pat[1], pat[2]

{ . . . }

In this case, the action is performed for each line between an occurrence of pat[1] and the next occurrence of
pat[2] (inclusive). For example, the pattern
/Canada/, /Brazil/

matches lines starting with the first line that contains the string Canada up through the next occurrence of the
string Brazil:
Canada 3852 24 North America
China 3692 866 Asia
USA 3615 219 North America
Brazil 3286 116 South America
Similarly, since FNR is the number of the current record in the current input file (and FILENAME is the name
of the current input file), the program
FNR == 1, FNR == 5 { print FILENAME, $0 }

prints the first five records of each input file with the name of the current input file prepended.

Actions
In a patternaction statement, the action determines what is to be done with the input records that the pattern
selects. Actions frequently are simple printing or assignment statements, but they may also be a combination
of one or more statements. This section describes the statements that can make up actions.

Builtin variables
``awk builtin variables'' lists the builtin variables that awk maintains. You have already learned some of
these; others appear in this and later sections.
awk builtin variables

Pattern ranges

68

Programming with awk


Variable
ARGC
ARGV
FILENAME
FNR
FS
NF
NR
OFMT
OFS
ORS
RS
RSTART
RLENGTH
SUBSEP

Meaning
number of commandline arguments
array of commandline arguments
name of current input file
record number in current file
input field separator
number of fields in current record
number of records read so far
output format for numbers
output field separator
output record separator
input record separator
index of first character matched by match
length of string matched by match
subscript separator

Default

blank&tab

%.6g
blank
newline
newline

"\034"

Arithmetic
Actions can use conventional arithmetic expressions to compute numeric values. As a simple example,
suppose you want to print the population density for each country in the file countries. Since the second field
is the area in thousands of square miles and the third field is the population in millions, the expression 1000
$3 / $2 gives the population density in people per square mile. The program
{ printf "%10s %6.1f\n", $1, 1000 * $3 / $2 }

when applied to the file countries, prints the name of each country and its population density:
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria

30.3
6.2
234.6
60.6
35.3
4.7
502.0
24.3
19.6
19.6

Arithmetic is done internally in floating point. The arithmetic operators are +, , , /, % (remainder) and ^
(exponentiation; is a synonym). Arithmetic expressions can be created by applying these operators to
constants, variables, field names, array elements, functions, and other expressions, all of which are discussed
later. Note that awk recognizes and produces scientific (exponential) notation: 1e6, 1E6, 10e5, and 1000000
are numerically equal.
awk has assignment statements like those found in the C programming language. The simplest form is the
assignment statement
v = e

Arithmetic

69

Programming with awk


where v is a variable or field name, and e is an expression. For example, to
compute the number of Asian countries and their total population, you could write
$4 == "Asia"
END

{ pop = pop + $3; n = n + 1 }


{ print "population of", n,
"Asian countries in millions is", pop }

Applied to countries, this program produces


population of 3 Asian countries in millions is 1765

The action associated with the pattern $4 == "Asia" contains two assignment statements, one to accumulate
population and the other to count countries. The variables are not explicitly initialized, yet everything works
properly because awk initializes each variable with the string value "" and the numeric value 0.
The assignments in the previous program can be written more concisely using the operators += and ++:
$4 == "Asia" { pop += $3; ++n }
The operator += is borrowed from the C programming language; therefore,
pop += $3

has the same effect as


pop = pop + $3

but the += operator is shorter and runs faster. The same is true of the ++ operator, which adds one to a
variable.
The abbreviated assignment operators are +=, =, =, /=, %=, and ^=. Their meanings are similar:
v op= e
has the same effect as
v = v op e.
The increment operators are ++ and . As in C, they may be used as prefix (++x) or postfix (x++) operators.
If x is 1, then i=++x increments x, then sets i to 2, while i=x++ sets i to 1, then increments x. An analogous
interpretation applies to prefix and postfix .
Assignment and increment and decrement operators may all be used in arithmetic expressions.
We use default initialization to advantage in the following program, which finds the country with the largest
population:
maxpop < $3
END

{ maxpop = $3; country = $1 }


{ print country, maxpop }

Note, however, that this program would not be correct if all values of $3 were negative.

Arithmetic

70

Programming with awk


awk provides the builtin arithmetic functions listed in ``awk builtin arithmetic functions''.
awk builtin arithmetic functions
Function
atan2(y,x)
cos(x)
exp(x)
int(x)
log(x)
rand()
sin(x)
sqrt(x)
srand(x)

Value returned
arctangent of y/x in the range to
cosine of x, with x in radians
exponential function of x
integer part of x truncated towards 0
natural logarithm of x
random number between 0 and 1
sine of x, with x in radians
square root of x
x is new seed for rand

x and y are arbitrary expressions. The function rand returns a pseudorandom floating point number in the
range (0,1), and srand(x) can be used to set the seed of the generator. If srand has no argument, the seed is
derived from the time of day.

Strings and string functions


A string constant is created by enclosing a sequence of characters inside quotation marks, as in ``"abc"'' or
``"hello, everyone"''. String constants may contain the C programming language escape sequences for special
characters listed in ``Extended regular expressions''.
String expressions are created by concatenating constants, variables, field names, array elements, functions,
and other expressions. The program
{ print NR ":" $0 }

prints each record preceded by its record number and a colon, with no blanks. The three strings representing
the record number, the colon, and the record are concatenated and the resulting string is printed. The
concatenation operator has no explicit representation other than juxtaposition.
awk provides the builtin string functions shown in ``awk builtin string functions''. In this table, r represents
an extended regular expression (either as a string or as /r/), s and t string expressions, and n and p integers.
awk builtin string functions
Function
gsub(r,s)
gsub(r,s,t)
index(s,t)
length(s)
match(s,r)
split(s,a)
Strings and string functions

Description
substitute s for r globally in current record, return number of
substitutions
substitute s for r globally in string t, return number of substitutions
return position of string t in s, 0 if not present
return length of s
return the position in s where r occurs, 0 if not present
split s into array a on FS, return number of fields
71

Programming with awk


split(s,a,r)
sprintf(fmt,exprlist)
sub(r,s)
sub(r,s,t)
substr(s,p)
substr(s,p,n)
tolower(s)
toupper(s)

split s into array a on r, return number of fields


return exprlist formatted according to format string fmt
substitute s for first r in current record, return number of substitutions
substitute s for first r in t, return number of substitutions
return substring of s starting at position p
return substring of s of length n starting at position p
return a string in which each upper case character in string s is replaced
by a lower case character
return a string in which each lower case character in string s is replaced
by an upper case character

The functions sub and gsub are patterned after the substitute command in the text editor ed(C). The function
gsub(r,s,t) replaces successive occurrences of substrings matched by the extended regular expression r with
the replacement string s in the target string t. (As in ed, the leftmost match is used, and is made as long as
possible.) It returns the number of substitutions made. The function gsub(r,s) is a synonym for gsub(r,s,,$0).
For example, the program
{ gsub(/USA/, "United States"); print }

transcribes its input, replacing occurrences of USA by United States. The sub functions are similar, except
that they only replace the first matching substring in the target string.
The function index(s,t) returns the leftmost position where the string t begins
in s, or zero if t does not occur in s. The first character in a string is at position 1. For example,
index("banana", "an")

returns 2.
The length function returns the number of characters in its argument string; thus,
{ print length($0), $0 }

prints each record, preceded by its length. ($0 does not include the input record separator.) The program
length($1) > max
END

{ max = length($1); name = $1 }


{ print name }

when applied to the file countries, prints the longest country name:
Australia.
The match(s,r) function returns the position in string s where extended regular expression r occurs, or 0 if it
does not occur. This function also sets two builtin variables RSTART and RLENGTH. RSTART is set to
the starting position of the match in the string; this is the same value as the returned value. RLENGTH is set
to the length of the matched string. (If a match does not occur, RSTART is 0, and RLENGTH is 1.) For
example, the following program finds the first occurrence of the letter i followed by at most one character
followed by the letter a in a record:
{ if (match($0, /i.?a/))
print RSTART, RLENGTH, $0 }

Strings and string functions

72

Programming with awk


It produces the following output on the file countries:
17
26
3
24
27
8
4
7
17
6

2
3
3
3
3
2
2
3
3
2

USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria

8650
3852
3692
3615
3286
2968
1269
1072
968
920

262
24
866
219
116
14
637
26
19
18

Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa

NOTE: match matches the leftmost longest matching string. For example, with the record
AsiaaaAsiaaaaan
as input, the program
{ if (match($0, /a+/)) print RSTART, RLENGTH, $0 }
matches the first string of a's and sets RSTART to 4 and RLENGTH to 3.

The function sprintf(format, expr[1], expr[2], . . ., expr[n]) returns (without printing) a string containing
expr[1], expr[2], . . ., expr[n] formatted according to the printf specifications in the string format. ``The printf
statement'' contains a complete specification of the format conventions. The statement
x = sprintf("%10s %6d", $1, $2)

assigns to x the string produced by formatting the values of $1 and $2 as a tencharacter string and a decimal
number in a field of width at least six; x may be used in any subsequent computation.
The function substr(s,p,n) returns the substring of s that begins at position p and is at most n characters long.
If substr(s,p) is used, the substring goes to the end of s; that is, it consists of the suffix of s beginning at
position p. For example, we could abbreviate the country names in countries to their first three characters by
invoking the program
{ $1 = substr($1, 1, 3); print }

on this file to produce


USS
Can
Chi
USA
Bra
Aus
Ind
Arg
Sud
Alg

8650
3852
3692
3615
3286
2968
1269
1072
968
920

262
24
866
219
116
14
637
26
19
18

Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa

Note that setting $1 in the program forces awk to recompute $0 and, therefore, the fields are separated by
Strings and string functions

73

Programming with awk


blanks (the default value of OFS), not by tabs.
Strings are stuck together (concatenated) merely by writing them one after another in an expression. For
example, when invoked on the file countries,
END

{ s = s substr($1, 1, 3) " " }


{ print s }

prints
USS Can Chi USA Bra Aus Ind Arg Sud Alg

by building s up a piece at a time from an initially empty string.

Field variables
The fields of the current record can be referred to by the field variables $1, $2, . . ., $NF. Field variables share
all of the properties of other variablesthey may be used in arithmetic or string operations, and they may
have values assigned to them. So, for example, you can divide the second field of the file countries by 1000 to
convert the area from thousands to millions of square miles:
{ $2 /= 1000; print }

or assign a new string to a field:


BEGIN
$4 == "North America"
$4 == "South America"

{
{
{
{

FS = OFS = "\t" }
$4 = "NA" }
$4 = "SA" }
print }

The BEGIN action in this program resets the input field separator FS and the output field separator OFS to a
tab. Notice that the print in the fourth line of the program prints the value of $0 after it has been modified by
previous assignments.
Fields can be accessed by expressions. For example, $(NF1) is the second to last field of the current record.
The parentheses are needed: the value of $NF1 is 1 less than the value in the last field.
A field variable referring to a nonexistent field, for example, $(NF+1), has as its initial value the empty string.
A new field can be created, however, by assigning a value to it. For example, the following program invoked
on the file countries creates a fifth field giving the population density:
BEGIN

{ FS = OFS = "\t" }
{ $5 = 1000 * $3 / $2; print }

The number of fields can vary from record to record, but usually the implementation limit is 100 fields per
record.

Number or string?
Variables, fields and expressions can have both a numeric value and a string value. They take on numeric or
string values according to context. For example, in the context of an arithmetic expression like

Field variables

74

Programming with awk


pop += $3

pop and $3 must be treated numerically, so their values will be coerced to numeric type if necessary.
In a string context like
print $1 ":" $2

$1 and $2 must be strings to be concatenated, so they will be coerced if necessary.


In an assignment v =e or v op=e, the type of v becomes the type of e. In an ambiguous context like
$1 == $2

the type of the comparison depends on whether the fields are numeric or string, and this can only be
determined when the program runs; it may well differ from record to record.
In comparisons, if both operands are numeric, the comparison is numeric; otherwise, operands are coerced to
strings, and the comparison is made on the string values. All field variables are of type string; in addition,
each field that contains only a number is also considered numeric. This determination is done at run time. For
example, the comparison ``"$1 == $2"'' will succeed on any pair of the inputs
1

1.0

+1

0.1e+1

10E1

001

.01E2

but will fail on the inputs


(null) 0
(null)
0a
0
1e50 1.0e50

0.0

There are two idioms for coercing an expression of one type to the other:

number
concatenate a null string to a number to coerce it to type string
string + 0
add zero to a string to coerce it to type numeric
Thus, to force a string comparison between two fields, use
$1 "" == $2 ""

The numeric value of a string is the value of any prefix of the string that looks numeric; thus the value of
12.34x is 12.34, while the value of x12.34 is zero. The string value of an arithmetic expression is computed by
formatting the string with the output format conversion OFMT.
Uninitialized variables have numeric value 0 and string value "". Nonexistent fields and fields that are
explicitly null have only the string value ""; they are not numeric.

Field variables

75

Programming with awk

Control flow statements


awk provides ifelse, while, dowhile, and for statements, and statement grouping with braces, as in the C
programming language.
The if statement syntax is
if (expression) statement[1] else statement[2]
The expression acting as the conditional has no restrictions; it can include the relational operators <, <=, >,
>=, ==, and !=; the extended regular expression matching operators ~ and !~; the logical operators ||, &&, and
!; juxtaposition for concatenation; and parentheses for grouping.
In the if statement, awk first evaluates the expression. If it is nonzero and nonnull, statement[1] is
executed; otherwise statement[2] is executed. The else part is optional.
A single statement can always be replaced by a statement list enclosed in braces. The statements in the
statement list are terminated by newlines or semicolons.
Rewriting the maximum population program from the ``Arithmetic functions'' section with an if statement
results in
{

}
END

if (maxpop < $3) {


maxpop = $3
country = $1
}
{ print country, maxpop }

The while statement is exactly that of the C programming language:


while (expression) statement

The expression is evaluated; if it is nonzero and nonnull the statement is executed and the expression is
tested again. The cycle repeats as long as the expression is nonzero. For example, to print all input fields one
per line,
{

i = 1
while (i <= NF) {
print $i
i++
}

The for statement is like that of the C programming language:


for (expression[1]; expression; expression[2]) statement

It has the same effect as


expression[1]
while (expression) {
statement
expression[2]

Control flow statements

76

Programming with awk


}

so
{ for (i = 1; i <= NF; i++)

print $i }

does the same job as the while example shown above. An alternate version of the for statement is described in
the next section.
The do statement has the form
do statement while (expression)

The statement is executed repeatedly until the value of the expression becomes zero. Because the test takes
place after the execution of the statement (at the bottom of the loop), it is always executed at least once. As a
result, the do statement is used much less often than while or for, which test for completion at the top of the
loop.
The following example of a do statement prints all lines except those between start and stop.
/start/ {
do {
getline x
} while (x !~ /stop/)
}
{ print }

The break statement causes an immediate exit from an enclosing while or for; the continue statement causes
the next iteration to begin. The next statement causes awk to skip immediately to the next record and begin
matching patterns starting from the first patternaction statement.
The exit statement causes the program to behave as if the end of the input had occurred; no more input is read,
and the END action, if any, is executed. Within the END action,
exit expr

causes the program to return the value of expr as its exit status. If there is no expr, the exit status is zero.

Arrays
awk provides onedimensional arrays. Arrays and array elements need not be declared; like variables, they
spring into existence by being mentioned. An array subscript may be a number or a string.
As an example of a conventional numeric subscript, the statement
x[NR] = $0

assigns the current input line to the NRth element of the array x . In fact, it is
possible in principle (though perhaps slow) to read the entire input into an array with the awk program
END

Arrays

{ x[NR] = $0 }
{ . . . processing . . . }

77

Programming with awk


The first action merely records each input line in the array x, indexed by line number; processing is done in
the END statement.
Array elements may also be named by nonnumeric values. For example, the following program accumulates
the total population of Asia and Africa into the associative array pop. The END action prints the total
population of these two continents.
/Asia/
/Africa/
END { print
print
}

{ pop["Asia"] += $3 }
{ pop["Africa"] += $3 }
"Asian population in millions is", pop["Asia"]
"African population in millions is", pop["Africa"]

On the file countries, this program generates


Asian population in millions is 1765
African population in millions is 37
In this program if you had used pop[Asia] instead of pop["Asia"] the expression would have used the value
of the variable Asia as the subscript, and since the variable is uninitialized, the values would have been
accumulated in pop[""] .
Suppose your task is to determine the total area in each continent of the file countries. Any expression can be
used as a subscript in an array reference. Thus
area[$4] += $2

uses the string in the fourth field of the current input record to index the array area and, in that entry,
accumulates the value of the second field:
BEGIN { FS = "\t" }
{ area[$4] += $2 }
END
{ for (name in area)
print name, area[name] }

Invoked on the file countries, this program produces


Africa 1888
South America 4358
North America 7467
Australia 2968
Asia 13611

This program uses a form of the for statement that iterates over all defined subscripts of an array:
for (i in array) statement

executes statement with the variable i set in turn to each value of i for which array[i] has been defined. The
loop is executed once for each defined subscript, which is chosen in a random order. Results are unpredictable
when i or array is altered during the loop.
awk does not provide multidimensional arrays, but it does permit a list of subscripts. They are combined into
a single subscript with the values separated by an unlikely string (stored in the variable SUBSEP). For
example,
Arrays

78

Programming with awk


for (i = 1; i <= 10; i++)
for (j = 1; j <= 10; j++)
arr[i,j] = ...

creates an array which behaves like a twodimensional array; the subscript is the concatenation of i,
SUBSEP, and j.
You can determine whether a particular subscript i occurs in an array arr by testing the condition i in arr, as in
if ("Africa" in area) ...

This condition performs the test without the side effect of creating area["Africa"], which would happen if
you used
if (area["Africa"] != "") ...

Note that neither is a test of whether the array area contains an element with the value "Africa" .
It is also possible to split any string into fields in the elements of an array using the builtin function split.
The function
split("s1:s2:s3", a, ":")

splits the string s1:s2:s3 into three fields, using the separator :, and stores s1 in a[1], s2 in a[2], and s3 in a[3].
The number of fields found, here three, is returned as the value of split. The third argument of split is a
extended regular expression to be used as the field separator. If the third argument is missing, FS is used as
the field separator.
An array element may be deleted with the delete statement:
delete arrayname[subscript]

Userdefined functions
awk provides userdefined functions. A function is defined as
function name(argumentlist) {
statements
}

The definition can occur anywhere a patternaction statement can. The argument list is a list of variable
names separated by commas; within the body of the function these variables refer to the actual parameters
when the function is called. No space must be left between the function name and the left parenthesis of the
argument list when the function is called; otherwise it looks like a concatenation. For example, the following
program defines and tests the usual recursive factorial function (of course, using some input other than the file
countries):
function fact(n)
if (n <= 1)
return
else
return
}
{ print $1 "! is

{
1
n * fact(n1)
" fact($1) }

Userdefined functions

79

Programming with awk


Array arguments are passed by reference, as in C, so it is possible for the function to alter array elements or
create new ones. Scalar arguments are passed by value, however, so the function cannot affect their values
outside. Within a function, formal parameters are local variables, but all other variables are global. (You can
have any number of extra formal parameters that are used only as local variables.) The return statement is
optional, but the returned value is undefined if it is not included.

Some lexical conventions


Comments may be placed in awk programs; they begin with the character # and end at the end of the line, as
in
print x, y

# this is a comment

Statements in an awk program normally occupy a single line. Several statements may occur on a single line if
they are separated by semicolons. A long statement may be continued over several lines by terminating each
continued line by a backslash. (It is not possible to continue a ``...'' string.) This explicit continuation is rarely
necessary, however, since statements continue automatically if the line ends with a comma. (For example, this
might occur in a print or printf statement) or after the operators && and ||.
Several patternaction statements may appear on a single line if separated by semicolons.

Output
The print and printf statements are the two primary constructs that generate output. The print statement is
used to generate simple output; printf is used for more carefully formatted output. Like the shell, awk lets
you redirect output, so that output from print and printf can be directed to files and pipes. This section
describes the use of these two statements.

The print statement


The statement
print expr[1], expr[2], . . ., expr[n]

prints the string value of each expression separated by the output field separator followed by the output record
separator. The statement
print

is an abbreviation for
print $0

To print an empty line, use


print ""

Output separators
The output field separator and record separator are held in the builtin variables OFS and ORS. Initially,
Some lexical conventions

80

Programming with awk


OFS is set to a single blank and ORS to a single newline, but these values can be changed at any time. For
example, the following program prints the first and second fields of each record with a colon between the
fields and two newlines after the second field:
BEGIN

{ OFS = ":"; ORS = "\n\n" }


{ print $1, $2 }

Notice that
{ print $1 $2 }

prints the first and second fields with no intervening output field separator because $1 $2 is a string consisting
of the concatenation of the first two fields.

The printf statement


awk's printf statement is essentially the same as that in C except that the format specifier is not supported.
The printf statement has the general form
printf format, expr[1], expr[2], . . ., expr[n]

where format is a string that contains both information to be printed and specifications on what conversions
are to be performed on the expressions in the argument list, as in ``awk printf conversion characters''. Each
specification begins with a %, ends with a letter that determines the conversion, and may include:

Leftjustify expression in its field.


width
Pad field to this width as needed; fields that begin with a leading 0 are padded with zeros.
.prec
Specify maximum string width or digits to right of decimal point.
``awk printf conversion characters'' lists the printf conversion characters.
awk printf conversion characters
Character
c
d
e
f
g
o
s
x
%

Prints expression as
single character
decimal number
[]d.ddddddE[+]dd
[]ddd.dddddd
e or f conversion, whichever is shorter, with nonsignificant zeros suppressed
unsigned octal number
string
unsigned hexadecimal number
print a %; no argument is converted

The printf statement

81

Programming with awk


Below are some examples of printf statements along with the corresponding output:
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf
printf

"%d", 99/2
49
"%e", 99/2
4.950000e+01
"%f", 99/2
49.500000
"%6.2f", 99/2 49.50
"%g", 99/2
49.5
"%o", 99
143
"%06o", 99
000143
"%x", 99
63
"|%s|", "January"
|January|
"|%10s|", "January"
|
January|
"|%10s|", "January" |January
|
"|%.3s|", "January"
|Jan|
"|%10.3s|", "January" |
Jan|
"|%10.3s|", "January"
|Jan
"%%" %

The default output format of numbers is %.6g; this can be changed by assigning a new value to OFMT.
OFMT also controls the conversion of numeric values to strings for concatenation and creation of array
subscripts.

Output to files
You can print output to files instead of to the standard output by using the > and >> redirection operators. For
example, the following program invoked on the file countries prints all lines where the population (third field)
is bigger than 100 into a file called bigpop, and all other lines into a file called smallpop:
$3 > 100
$3 <= 100

{ print $1, $3 >"bigpop" }


{ print $1, $3 >"smallpop" }

Notice that the filenames have to be quoted; without quotes, bigpop and smallpop are merely uninitialized
variables. If the output filenames were created by an expression, they would also have to be enclosed in
parentheses:
$4 ~ /North America/ { print $1 > ("tmp" FILENAME) }

because the > operator has higher precedence than concatenation; without parentheses, the concatenation of
tmp and FILENAME would not work.
NOTE: Files are opened once in an awk program. If > is used to open a file, its original contents are
overwritten. But if >> is used to open a file, its contents are preserved and the output is appended to the file.
Once the file has been opened, the two operators have the same effect.

Output to pipes
You can also direct printing to a pipe with a command on the other end, instead of to a file. The statement
print | "commandline"

causes the output of print to be piped into the commandline.

Output to files

82

Programming with awk


Although they are shown here as literal strings enclosed in quotes, the commandline and filenames can
come from variables and the return values from functions.
Suppose you want to create a list of continentpopulation pairs, sorted alphabetically by continent. The awk
program below accumulates the population values in the third field for each of the distinct continent names in
the fourth field in an array called pop. Then it prints each continent and its population, and pipes this output
into the sort command.
BEGIN
END

{ FS = "\t" }
{ pop[$4] += $3 }
{ for (c in pop)
print c ":" pop[c] | "sort" }

Invoked on the file countries, this program yields


Africa:37
Asia:1765
Australia:14
North America:243
South America:142

In all these print statements involving redirection of output, the files or pipes are identified by their names
(that is, the pipe above is literally named sort ), but they are created and opened only once in the entire run.
So, in the last example, for all c in pop, only one sort pipe is open.
There is a limit to the number of files that can be open simultaneously. The statement close(file) closes a file
or pipe; file is the string used to create it in the first place, as in
close("sort")

When opening or closing a file, different strings are different commands.

Input
The most common way to give input to an awk program is to name on the command line the file(s) that
contains the input. This is the method used in this topic; however, several other methods can be used. Each of
these is described in this section.

Files and pipes


You can provide input to an awk program by putting the input data into a file, say awkdata, and then
executing
$ awk 'program' awkdata<<Return>>

If no filenames are given, awk reads its standard input; thus, a second common arrangement is to have another
program pipe its output into awk. For example, grep(C) selects input lines containing a specified regular
expression, but it can do so faster than awk, since this is the only thing it does. We could, therefore, invoke
the pipe
$ grep 'Asia' countries | awk '. . .'<<Return>>

Input

83

Programming with awk


grep quickly finds the lines containing Asia and passes them on to the awk program for subsequent
processing.

Input separators
With the default setting of the field separator FS, input fields are separated by blanks or tabs, and leading
blanks are discarded, so each of these lines has the same first field:
field1
field1
field1

field2

When the field separator is a tab, however, leading blanks are not discarded.
The field separator can be set to any extended regular expression by assigning a value to the builtin variable
FS. For example,
BEGIN { FS = ",[\t]*|([\t]+)" }

makes into field separators every string consisting of a comma followed by blanks or tabs and every string of
blanks or tabs with no comma. FS can also be set on the command line with the F argument:
$ awk F'(,[ \t] )|([ \t]+)' '. . .'<<Return>>

behaves the same as the previous example. Regular expressions used as field separators match the leftmost
longest occurrences (as in sub), but do not match null strings.

Multiline records
Records are normally separated by newlines, so that each line is a record; but this too can be changed, though
only in a limited way. If the builtin record separator variable RS is set to the empty string, as in
BEGIN

{ RS = "" }

then input records can be several lines long; a sequence of empty lines separates records. A common way to
process multipleline records is to use
BEGIN

{ RS = ""; FS = "\n" }

to set the record separator to an empty line and the field separator to a newline. Each line is then one field.
However, the length of a record is limited; it is usually about 2500 characters. ``The getline function'' and
``Cooperation with the shell'' show other examples of processing multiline records.

The getline function


awk's facility for automatically breaking its input into records that are more than one line long is not adequate
for some tasks. For example, if records are not separated by blank lines, but by something more complicated,
merely setting RS to null does not work. In such cases, the program must manage the splitting of each record
into fields. Here are some suggestions.

Input separators

84

Programming with awk


The function getline can be used to read input either from the current input or from a file or pipe, by
redirection analogous to printf. By itself, getline fetches the next input record and performs the normal
fieldsplitting operations on it. It sets NF, NR, and FNR. getline returns 1 if there was a record present, 0 if
the endoffile was encountered, and 1 if some error occurred (such as failure to open a file).
To illustrate, suppose you have input data consisting of multiline records, each of which begins with a line
beginning with START and ends with a line beginning with STOP. The following awk program processes
these multiline records, a line at a time, putting the lines of the record into consecutive entries of an array
f[1] f[2] ... f[nf]

Once the line containing STOP is encountered, the record can be processed from the data in the f array:
/^START/ {
f[nf=1] = $0
while (getline && $0 !~ /^STOP/)
f[++nf] = $0
# now process the data in f[1]...f[nf]
...
}

Notice that this code uses the fact that && evaluates its operands left to right and stops as soon as one is true.
The same job can also be done by the following program:
/^START/ && nf==0
{ f[nf=1] = $0 }
nf > 1
{ f[++nf] = $0 }
/^STOP/
{ # now process the data in f[1]...f[nf]
...
nf = 0
}

The statement
getline x

reads the next record into the variable x. No splitting is done; NF is not set. The statement
getline <"file"

reads from file instead of the current input. It has no effect on NR or FNR, but field splitting is performed and
NF is set. The statement
getline x <"file"

gets the next record from file into x; no splitting is done, and NF, NR and FNR are untouched.
If a filename is an expression, it should be in parentheses for evaluation:
while ( getline x < (ARGV[1] ARGV[2]) ) {

... }

because the < has precedence over concatenation. Without parentheses, a statement such as
getline x < "tmp" FILENAME

Input separators

85

Programming with awk


sets x to read the file tmp and not tmp <value of FILENAME>. Also, if you use this getline statement form,
a statement like
while ( getline x < file ) { ... }

loops forever if the file cannot be read because getline returns 1, not zero if an error occurs. A better way to
write this test is
while ( getline x < file > 0) { ... }

You can also pipe the output of another command directly into getline. For example, the statement
while ("who" | getline)
n++

executes who and pipes its output into getline. Each iteration of the while loop reads one more line and
increments the variable n, so after the while loop terminates, n contains a count of the number of users.
Similarly, the statement
"date" | getline d

pipes the output of date into the variable d, thus setting d to the current date. Note that, in this case, awk
leaves the pipeline (and thus the resources associated with date) open, since only one line was read from the
pipeline. An explicit close("date") will clear up these unneeded resources. Similarly, if a new invocation of
date is desired later, an explicit close("date") is also needed. Otherwise getline would try to read a second
line from the first invocation. ``getline function'' summarizes the getline function.
getline function
Form
getline
getline var
getline <file
getline var <file
cmd | getline
cmd | getline var

Sets
$0, NF, NR, FNR
var, NR, FNR
$0, NF
var
$0, NF
var

Commandline arguments
The commandline arguments are available to an awk program: the array ARGV contains the elements
ARGV[0], . . ., ARGV[ARGC1]; as in C, ARGC is the count. ARGV[0] is the name of the program
(generally awk); the remaining arguments are whatever was provided (excluding the program and any
optional arguments) when awk is invoked. The following command line contains an awk program that echoes
the arguments that appear after the program name:
awk '
BEGIN {
for (i = 1; i < ARGC; i++)
printf "%s ", ARGV[i]
printf "\n"
}' $

Commandline arguments

86

Programming with awk


The arguments may be modified or added to; ARGC may be altered. As each input file ends, awk treats the
next nonnull element of ARGV (up to the current value of ARGC1) as the name of the next input file.
One exception to the rule that an argument is a filename is when it is in the form
var=value
Then the variable var is set to the value value, as if by assignment. Such an argument is not treated like a
filename. If value is a string, no quotes are needed.
A better way to initialize an awk variable on the command line is to use the v option (vvar=value) since
there is no ambiguity.

Using awk with other commands and the shell


awk is most powerful when it is used in conjunction with other programs. This section discusses some of the
ways in which awk programs cooperate with other commands.

The system function


The builtin function system(commandline) executes the command commandline, which may be a string
computed by, for example, the builtin function sprintf. The value returned by system is the return status of
the command executed.
For example, the program
$1 == "#include" { gsub(/[<>"]/, "", $2);
system("cat " $2) }

calls the command cat to print the file named in the second field of every input record whose first field is
#include, after stripping any <, >, or " that might be present.

Cooperation with the shell


In all the examples thus far, the awk program was in a file and was fetched from there using the f flag, or it
appeared on the command line enclosed in single quotes, as in
awk '{ print $1 }' . . .

Since awk uses many of the same characters as the shell does, such as $ and ", surrounding the awk program
with single quotes ensures that the shell will pass the entire program unchanged to the awk interpreter.
Now, consider writing a command addr that will search a file addresslist for name, address and telephone
information. Suppose that addresslist contains names and addresses in which a typical entry is a multiline
record such as
G. R. Emlin
600 Mountain Avenue
Murray Hill, NJ 07974
2015551234

Using awk with other commands and the shell

87

Programming with awk


Records are separated by a single blank line.
You want to search the address list by issuing commands like
addr Emlin

That is easily done by a program of the form


awk '
BEGIN
{ RS = "" }
/Emlin/
' addresslist

The problem is how to get a different search pattern into the program each time it is run.
There are several ways to do this. One way is to create a file called addr that contains
awk '
BEGIN
{ RS = "" }
/'"$1"'/
' addresslist

The quotes are critical here. The awk program is only one argument, even though there are two sets of quotes
because quotes do not nest. The $1 is outside the single quotes but inside the double quotes, and thus is visible
to the shell, which therefore replaces it by the pattern Emlin when the command addr Emlin is invoked. On a
SCO OpenServer system, addr can be made executable by changing its mode with the following command:
chmod +x addr

A second way to implement addr relies on the fact that the shell substitutes for $ parameters within double
quotes:
awk "
BEGIN
{ RS = \"\" }
/$1/
" addresslist

Therefore, you must protect the quotes defining RS with backslashes, so that the shell passes them on to awk
without interpretation. $1 is recognized as a parameter, however, so the shell replaces it by the pattern when
the command addr pattern is invoked.
A third way to implement addr is to use ARGV to pass the extended regular expression to an awk program
that explicitly reads through the address list with getline:
awk '
BEGIN

{ RS = ""
while (getline < "addresslist")
if ($0 ~ ARGV[1])
print $0

} ' $*

All processing is done in the BEGIN action.


Notice that any regular expression can be passed to addr; in particular, it is possible to retrieve by parts of an
Using awk with other commands and the shell

88

Programming with awk


address or telephone number as well as by name.

Example applications
awk has been used in surprising ways: to implement database systems and a variety of compilers and
assemblers, in addition to the more traditional tasks of information retrieval, data manipulation, and report
generation. Invariably, the awk programs are significantly shorter than equivalent programs written in more
conventional programming languages such as Pascal or C. This section presents a few more examples to
illustrate some additional awk programs.

Generating reports
awk is especially useful for producing reports that summarize and format information. Suppose you want to
produce a report from the file countries in which the continents are listed alphabetically, and the countries on
each continent are listed after in decreasing order of population:
Africa:
Sudan
Algeria

19
18

Asia: China 866 India 637 USSR 262


Australia: Australia 14
North America: USA 219 Canada 24
South America: Brazil 116 Argentina 26 As with many data processing tasks, it is much easier to produce this
report in several stages. First, create a list of continentcountrypopulation triples, in which each field is
separated by a colon. This can be done with the following program triples, which uses an array pop indexed
by subscripts of the form continent:country to store the population of a given country. The print statement in
the END section of the program creates the list of continentcountrypopulation triples that are piped to the
sort routine.
BEGIN { FS = "\t" }
{ pop[$4 ":" $1] += $3 }
END
{ for (cc in pop)
print cc ":" pop[cc] | "sort t: +0 1 +2nr" }

The arguments for sort deserve special mention. The t: argument tells sort to use : as its field separator. The
+0 1 arguments make the first field the primary sort key. In general, +i j makes fields i+1, i+2, . . ., j the
sort key. If j is omitted, the fields from i+1 to the end of the record are used. The +2nr argument makes the
third field, numerically decreasing, the secondary sort key (n is for numeric, r for reverse order). Invoked on
the file countries, this program produces as output
Africa:Sudan:19
Africa:Algeria:18
Asia:China:866
Asia:India:637
Asia:USSR:262
Australia:Australia:14
North America:USA:219
North America:Canada:24

Example applications

89

Programming with awk


South America:Brazil:116
South America:Argentina:26

This output is in the right order but the wrong format. To transform the output into the desired form, run it
through a second awk program format:
BEGIN
{

{ FS = ":" }
if ($1 != prev) {
print "\n" $1 ":"
prev = $1
}
printf "\t%10s %6d\n", $2, $3

This is a controlbreak program that prints only the first occurrence of a continent name and formats the
countrypopulation lines associated with that continent in the desired manner. The command line
$ awk f triples countries | awk f format<<Return>>

gives the desired report. As this example suggests, complex data transformation and formatting tasks can
often be reduced to a few simple awk commands and sorts.

Word frequencies
Our first example illustrates associative arrays for counting. Suppose you want to count the number of times
each word appears in the input, where a word is any contiguous sequence of nonblank, nontab characters.
The following program prints the word frequencies, sorted in decreasing order.
{ for (w = 1; w <= NF; w++) count[$w]++ }
END { for (w in count) print count[w], w | "sort nr" }

The first statement uses the array count to accumulate the number of times each word is used. Once the input
has been read, the second for loop pipes the final count along with each word into the sort command.

Accumulation
Suppose we have two files, deposits and withdrawals, of records containing a name field and an amount
field. For each name we want to print the net balance determined by subtracting the total withdrawals from the
total deposits for each name. The net balance can be computed by the following program:
awk '
FILENAME == "deposits"
FILENAME == "withdrawals"
END

{ balance[$1] += $2 }
{ balance[$1] = $2 }
{ for (name in balance)
print name, balance[name]

} ' deposits withdrawals

The first statement uses the array balance to accumulate the total amount for each name in the file deposits.
The second statement subtracts associated withdrawals from each total. If only withdrawals are associated
with a name, an entry for that name is created by the second statement. The END action prints each name with
its net balance.

Word frequencies

90

Programming with awk

Random choice
The following function prints (in order) k random elements from the first n elements of the array A. In the
program, k is the number of entries that still need to be printed, and n is the number of elements yet to be
examined. The decision of whether to print the ith element is determined by the test rand() < k/n.
function choose(A, k, n, i) {
for (i = 1; n > 0; i++)
if (rand() < k/n) {
print A[i]
k
}
}
}

History facility
The following awk program roughly simulates the history facility of certain shells. A line containing only =
reexecutes the last command executed. A line beginning with = cmd reexecutes the last command whose
invocation included the string cmd. Otherwise, the current line is executed.
$1 == "=" { if (NF == 1)
system(x[NR] = x[NR1])
else
for (i = NR1; i > 0; i)
if (x[i] ~ $2) {
system(x[NR] = x[i])
break
}
next }

/./ { system(x[NR] = $0) }

Formletter generation
The following program generates form letters, using a template stored in a file called form.letter:
This is a form letter.
The first field is $1, the second $2, the third $3.
The third is $3, second is $2, and first is $1.

and replacement text of this form:


field 1|field 2|field 3
one|two|three
a|b|c

The BEGIN action stores the template in the array line; the remaining action cycles through the input data,
using gsub to replace template fields of the form $n with the corresponding data fields.
BEGIN {

FS = "|"
while (getline <"form.letter")
line[++n] = $0

Random choice

91

Programming with awk


{

for (i = 1; i <= n; i++) {


s = line[i]
for (j = 1; j <= NF; j++)
gsub("\\$"j, $j, s)
print s
}

In all such examples, a prudent strategy is to start with a small version and expand it, trying out each aspect
before moving on to the next.

awk summary
The following sections summarize the features of awk.

Command line
awk program filenames
awk f programfile filenames
awk Fs sets field separator to string s; Ft sets separator to tab

Patterns
BEGIN
END
/extended regular expression/
relational expression
pattern && pattern
pattern || pattern
(pattern)
!pattern
pattern, pattern

Control flow statements


if (expr) statement [else statement]
if (subscript in array) statement [else statement]
while (expr) statement
for (expr; expr; expr) statement
for (var in array) statement
do statement while (expr)
break
continue
next
exit [expr]
return [expr]

awk summary

92

Programming with awk

Inputoutput
close(filename)
getline

close file
set $0 from next input record; set NF, NR,
FNR
set $0 from next record of file; set NF
set var from next input record; set NR, FNR
set var from next record of file
print current record
print expressions
print expressions on file
format and print
format and print on file
execute command cmdline, return status

getline <file
getline var
getline var <file
print
print exprlist
print exprlist >file
printf fmt, exprlist
printf fmt, exprlist >file
system(cmdline)

In print and printf above, >>file appends to the file, and | command writes on a pipe. Similarly, command |
getline pipes into getline. getline returns 0 on end of file, and 1 on error.

Functions
func name(parameter list) { statement }
function name(parameter list) { statement }
functionname(expr, expr, . . .)

String functions
gsub(r,s,t)
index(s,t)
length(s)
match(s,r)
split(s,a,r)
sprintf(fmt, exprlist)
sub(r,s,t)
substr(s,i,n)

substitute string s for each substring matching extended regular expression r in


string t, return number of substitutions; if t omitted, use $0
return index of string t in string s, or 0 if not present
return length of string s
return position in s where extended regular expression r occurs, or 0 if r is not
present
split string s into array a on extended regular expression r, return number of fields;
if r omitted, FS is used in its place
print exprlist according to fmt, return resulting string
like gsub except only the first matching substring is replaced
return nchar substring of s starting at i; if n omitted, use rest of s

Arithmetic functions
atan2(y,x)

arctangent of y/x in radians

Inputoutput

93

Programming with awk


cos(expr)
exp(expr)
int(expr)
log(expr)
rand()
sin(expr)
sqrt(expr)
srand(expr)

cosine (angle in radians)


exponential
truncate to integer
natural logarithm
random number between 0 and 1
sine (angle in radians)
square root
new seed for random number generator; use time of day if no expr

Operators (increasing precedence)


= += = *= /= %= ^=
?:
||
&&
~ !~
< <= > >= != ==
blank
+
*/%
+!
^
++
$

assignment
conditional expression
logical OR
logical AND
extended regular expression match, negated match
relationals
string concatenation
add, subtract
multiply, divide, mod
unary plus, unary minus, logical negation
exponentiation ( is a synonym)
increment, decrement (prefix and postfix)
field

Regular expressions (increasing precedence)


c
\c
.
^
$
[abc...]
[^abc...]
r1|r2
r1r2
r+
r

matches nonmetacharacter c
matches literal character c
matches any character but newline
matches beginning of line or string
matches end of line or string
character class matches any of abc...
negated class matches any but abc...
matches either r1 or r2
concatenation: matches r1, then r2
matches one or more r's
matches zero or more r's

Operators (increasing precedence)

94

Programming with awk


r?
r{low,high}
(r)

matches zero or one r's


at least low rs but no more than high
grouping: matches r

Builtin variables
ARGC
ARGV
FILENAME
FNR
FS
NF
NR
OFMT
OFS
ORS
RS
RSTART
RLENGTH
SUBSEP

number of commandline arguments


array of commandline arguments (0..ARGC1)
name of current input file
input record number in current file
input field separator (default blank)
number of fields in current input record
input record number since beginning
output format for numbers (default %.6g)
output field separator (default blank)
output record separator (default newline)
input record separator (default newline)
index of first character matched by match(); 0 if no match
length of string matched by match(); 1 if no match
separates multiple subscripts in array elements; default \034

Limits
Any particular implementation of awk enforces some limits. Here are typical values:
100 fields
2500 characters per input record
2500 characters per output record
1024 characters per individual field
1024 characters per printf string
400 characters maximum quoted string
400 characters in character class
15 open files
1 pipe
numbers are limited to what can be represented on the local machine,
for example, 1e38..1e+38

Initialization, comparison, and type coercion


Each variable and field can potentially be a string or a number or both at any time. When a variable is set by
the assignment
Builtin variables

95

Programming with awk


var = expr

its type is set to that of the expression. (Assignment includes +=, =, and so on.) An arithmetic expression is
of type number, a concatenation is of type string, and so on. If the assignment is a simple copy, as in
v1 = v2

then the type of v1 becomes that of v2.


In comparisons, if both operands are numeric, the comparison is made numerically. Otherwise, operands are
coerced to string if necessary, and the
comparison is made on strings. The type of any expression can be coerced to numeric by subterfuges such as
expr + 0

and to string by
expr ""

(that is, concatenation with a null string).


Uninitialized variables have the numeric value 0 and the string value "". Accordingly, if x is uninitialized,
if (x) ...

is false, and
if (!x) ...
if (x == 0) ...
if (x == "") ...

are all true. But the following is false:


if (x == "0") ...

The type of a field is determined by context when possible; for example,


$1++

clearly implies that $1 is to be numeric, and


$1 = $1 "," $2

implies that $1 and $2 are both to be strings. Coercion is done as needed.


In contexts where types cannot be reliably determined, for example,
if ($1 == $2) ...

the type of each field is determined on input. All fields are strings; also, each field that contains only a number
is also considered numeric.

Builtin variables

96

Programming with awk


Fields that are explicitly null have the string value "" ; they are not numeric. Nonexistent fields (that is,
fields past NF) are treated this way, too.
As it is for fields, so it is for array elements created by split.
Mentioning a variable in an expression causes it to exist, with the value "" as described above. Thus, if arr[i]
does not currently exist,
if (arr[i] == "") ...

causes it to exist with the value "" so the if is satisfied. The special construction
if (i in arr) ...

determines if arr[i] exists without the side effect of creating it if it does not.

Builtin variables

97

Lexical analysis with lex


lex is a software tool that lets you solve a wide class of problems drawn from text processing, code
enciphering, compiler writing, and other areas. In text processing, you might check the spelling of words for
errors; in code enciphering, you might translate certain patterns of characters into others; and in compiler
writing, you might determine what the tokens are in the program to be compiled. The task common to all
these problems is lexical analysis: recognizing different strings of characters that satisfy certain
characteristics. Hence the name lex.
You do not have to use lex to handle problems of this kind. You could write programs in a standard language
like C to handle them, too. In fact, what lex does is produce such C programs. (lex is therefore called a
program generator.) What lex offers you, once you acquire a facility with it, is typically a faster, easier way to
create programs that perform these tasks. Its weakness is that it often produces C programs that are longer
than necessary for the task at hand and that execute more slowly than they otherwise might. In many
applications this is a minor consideration, and the advantages of using lex considerably outweigh it.
lex can also be used to collect statistical data on features of an input text, such as character count, word length,
number of occurrences of a word, and so forth. In the remaining sections, we will see
how to generate a lexical analyzer program
how to write lex source
how to use lex with yacc

Generating a lexical analyzer program


lex generates a C language scanner from a source specification that you write to solve the problem at hand.
This specification consists of a list of rules indicating sequences of characters expressions to be
searched for in an input text, and the actions to take when an expression is found. We will show you how to
write a lex specification in the next section.
The C source code for the lexical analyzer is generated when you enter
$ lex lex.l

where lex.l is the file containing your lex specification. (The name lex.l is the favored convention, but you
may use whatever name you want. Keep in mind, though, that the .l suffix is a convention recognized by other
SCO OpenServer system tools, in particular, make.) The source code is written to an output file called lex.yy.c
by default. That file contains the definition of a function called yylex() that returns 1 whenever an expression
you have specified is found in the input text, 0 when end of file is encountered. Each call to yylex() parses one
token. When yylex() is called again, it picks up where it left off.
Note that running lex on a specification that is spread across several files
$ lex lex1.l lex2.l lex3.l

produces one lex.yy.c. Invoking lex with the t option causes it to write its output to stdout rather than
lex.yy.c, so that it can be redirected:
$ lex t lex.l > lex.c

Lexical analysis with lex

98

Lexical analysis with lex


Options to lex must appear between the command name and the file name argument.
The lexical analyzer code stored in lex.yy.c (or the .c file to which it was redirected) must be compiled to
generate the executable object program, or scanner, that performs the lexical analysis of an input text. The lex
library, libl.a, supplies a default main() that calls the function yylex(), so you need not supply your own
main(). The library is accessed by specifying libl with the l option to cc:
$ cc lex.yy.c ll

Alternatively, you may want to write your own driver. The following is similar to the library version:
extern int yylex();

int yywrap() { return(1); }


main() { while (yylex()) ; } We will take a closer look at the function yywrap() in ``lex routines''. For now it
is enough to note that when your driver file is compiled with lex.yy.c
$ cc lex.yy.c driver.c

its main() will call yylex() at run time exactly as if the lex library had been loaded. The resulting executable
reads stdin and writes its output to stdout. Figure 41 shows how lex works.

Creation and use of a lexical analyzer with lex

Writing lex source


lex source consists of at most three sections: definitions, rules, and userdefined routines. The rules section is
mandatory. Sections for definitions and user routines are optional, but if present, must appear in the indicated
order:
definitions
%%
rules
%%
user routines
Writing lex source

99

Lexical analysis with lex

The fundamentals of lex rules


The mandatory rules section opens with the delimiter ``%%''. If a routines section follows, another ``%%''
delimiter ends the rules section. The ``%%'' delimiters must be entered at the beginning of a line, without
leading blanks. If there is no second delimiter, the rules section is presumed to continue to the end of the
program. Lines in the rules section that begin with white space and that appear before the first rule are copied
to the beginning of the function yylex(), immediately after the first brace. You might use this feature to
declare local variables for yylex().
Each rule consists of a specification of the pattern sought and the action(s) to take on finding it. The
specification of the pattern must be entered at the beginning of a line. The scanner writes input that does not
match a pattern directly to the output file. So the simplest lexical analyzer program is just the beginning rules
delimiter, ``%%''. It writes out the entire input to the output with no changes at all.
Regular expressions
You specify the patterns you are interested in with a notation called a regular expression. A regular expression
is formed by stringing together characters with or without operators. The simplest regular expressions are
strings of text characters with no operators at all:
apple
orange
pluto

These three regular expressions match any occurrences of those character strings in an input text. If you want
to have the scanner remove every occurrence of orange from the input text, you could specify the rule
orange

Because you specified a null action on the right with the semicolon, the scanner does nothing but print out the
original input text with every occurrence of this regular expression removed, that is, without any occurrence
of the string orange at all.
Operators
Unlike orange above, most of the expressions that we want to search for cannot be specified so easily. The
expression itself might simply be too long. More commonly, the class of desired expressions is too large; it
may, in fact, be infinite. Thanks to the use of operators summarized in O below we can form regular
expressions to signify any expression of a certain class. The + operator, for instance, means one or more
occurrences of the preceding expression, the ? means 0 or 1 occurrence(s) of the preceding expression (which
is equivalent, of course, to saying that the preceding expression is optional), and means 0 or more
occurrences of the preceding expression. (It may at first seem odd to speak of 0 occurrences of an expression
and to need an operator to capture the idea, but it is often quite helpful. We will see an example in a moment.)
So m+ is a regular expression that matches any string of ms:
mmm
m
mmmmm

and 7 is a regular expression that matches any string of zero or more 7s:
77

The fundamentals of lex rules

100

Lexical analysis with lex


77777

777 The empty third line matches because it has no 7s in it at all.


The | operator indicates alternation, so that ab|cd matches either ab or cd. The operators {} specify repetition,
so that a{1,5} looks for 1 to 5 occurrences of a. Brackets, [], indicate any one character from the string of
characters specified between the brackets. Thus, [dgka] matches a single d, g, k, or a. Note that the characters
between brackets must be adjacent, without spaces or punctuation. The ^ operator, when it appears as the first
character after the left bracket, indicates all characters in the standard set except those specified between the
brackets. (Note that |, {}, and ^ may serve other purposes as well; see below.) Ranges within a standard
alphabetic or numeric order (A through Z, a through z, 0 through 9) are specified with a hyphen. [az], for
instance, indicates any lowercase letter. Somewhat more interestingly,
[AZaz09 &#]

is a regular expression that matches any letter (whether upper or lowercase), any digit, an asterisk, an
ampersand, or a sharp character. Given the input text
$$$$?? ????!!! $$ $$$$$$&+====r~~# ((

the lexical analyzer with the previous specification in one of its rules will recognize , &, r, and #, perform on
each recognition whatever action the rule specifies (we have not indicated an action here), and print out the
rest of the text as it stands. If you want to include the hyphen character in the class, it should appear as the
first or last character in the brackets: [AZ] or [AZ].
The operators become especially powerful in combination. For example, the regular expression to recognize
an identifier in many programming languages is
[azAZ][09azAZ]

An identifier in these languages is defined to be a letter followed by zero or more letters or digits, and that is
just what the regular expression says. The first pair of brackets matches any letter. The second, if it were not
followed by a , would match any digit or letter. The two pairs of brackets with their enclosed characters
would then match any letter followed by a digit or a letter. But with the , the example matches any letter
followed by any number of letters or digits. In particular, it would recognize the following as identifiers:
e
not
idenTIFIER
pH
EngineNo99
R2D2

Note that it would not recognize the following as identifiers:


not_idenTIFIER
5times
$hello

because not_idenTIFIER has an embedded underscore; 5times starts with a digit, not a letter; and $hello
starts with a special character.

The fundamentals of lex rules

101

Lexical analysis with lex


A potential problem with operator characters is how we can specify them as characters to look for in a search
pattern. The last example, for instance, will not recognize text with a in it. lex solves the problem in one of
two ways: an operator character preceded by a backslash, or characters (except backslash) enclosed in double
quotation marks, are taken literally, that is, as part of the text to be searched for. To use the backslash method
to recognize, say, a followed by any number of digits, we can use the pattern
\ [19]

To recognize a \ itself, we need two backslashes: \\. Similarly, ``"x\ x"'' matches x x, and ``"y\"z"'' matches
y"z. Other lex operators are noted as they arise in the discussion below. lex recognizes all the C language
escape sequences.
lex operators
Expression
\x
"xy"
[xy]
[xz]
[^x]
.
^x
<y>x
x$
x?
x
x+
x{m,n}
xx|yy
x |
(x)
x/y
{xx}

Description
x, if x is a lex operator
xy, even if x or y are lex operators (except \)
x or y
x, y, or z
any character but x
any character but newline
x at the beginning of a line
x when lex is in start condition y
x at the end of a line
optional x
0, 1, 2, . . . instances of x
1, 2, 3, . . . instances of x
m through n occurrences of x
either xx or yy
the action on x is the action for the next rule
x
x but only if followed by y
the translation of xx from the definitions section

Actions
Once the scanner recognizes a string matching the regular expression at the start of a rule, it looks to the right
of the rule for the action to be performed. You supply the actions. Kinds of actions include recording the
token type found and its value, if any; replacing one token with another; and counting the number of instances
of a token or token type. You write these actions as program fragments in C. An action may consist of as
many statements as are needed for the job at hand. You may want to change the text in some way or simply
print a message noting that the text has been found. So, to recognize the expression Amelia Earhart and to
note such recognition, the rule
"Amelia Earhart"

printf("found Amelia");

The fundamentals of lex rules

102

Lexical analysis with lex


would do. And to replace in a text lengthy medical terms with their equivalent acronyms, a rule such as
Electroencephalogram

printf("EEG");

would be called for. To count the lines in a text, we need to recognize the ends of lines and increment a
linecounter. As we have noted, lex uses the standard C escape sequences, including \n for newline. So, to
count lines we might have
\n

lineno++;

where lineno, like other C variables, is declared in the definitions section that we discuss later.
Input is ignored when the C language null statement ; is specified. So the rule
[ \t\n]

causes blanks, tabs, and newlines to be ignored. Note that the alternation operator | can also be used to
indicate that the action for a rule is the action for the next rule. The previous example could have been written:
" "
\t
\n

|
|
;

with the same result.


The scanner stores text that matches an expression in a character array called yytext[]. You can print or
manipulate the contents of this array as you like.
In fact, lex provides a macro called ECHO that is equivalent to printf("%s", yytext). We will see an
example of its use in ``Start conditions''.
Sometimes your action may consist of a long C statement, or two or more C statements, and you wish to write
it on several lines. To inform lex that the action is for one rule only, simply enclose the C code in braces. For
example, to count the total number of all digit strings in an input text, print the running total of the number of
digit strings, and print out each one as soon as it is found, your lex code might be
\+?[19]+

{ digstrngcount++;
printf("%d",digstrngcount);
printf("%s", yytext);
}

This specification matches digit strings whether they are preceded by a plus sign or not, because the ?
indicates that the preceding plus sign is optional. In addition, it will catch negative digit strings because that
portion following the minus sign will match the specification. The next section explains how to distinguish
negative from positive integers.

Advanced lex usage


lex provides a suite of features that let you process input text riddled with quite complicated patterns. These
include rules that decide what specification is relevant when more than one seems so at first; functions that
transform one matching pattern into another; and the use of definitions and subroutines. Before considering
these features, you may want to affirm your understanding thus far by examining an example that draws
together several of the points already covered:

Advanced lex usage

103

Lexical analysis with lex


%%
[09]+
printf("negative integer");
\+?[09]+
printf("positive integer");
0.[09]+
printf("negative fraction, no whole number part");
rail[ \t]+road
printf("railroad is one word");
crook
printf("Here's a crook");
function
subprogcount++;
G[azAZ]*
{ printf("may have a G word here:%s", yytext);
Gstringcount++; }

The first three rules recognize negative integers, positive integers, and negative fractions between 0 and 1.
The use of the terminating + in each specification ensures that one or more digits compose the number in
question. Each of the next three rules recognizes a specific pattern. The specification for railroad matches
cases where one or more blanks intervene between the two syllables of the word. In the cases of railroad and
crook, we could have simply printed a synonym rather than the messages stated. The rule recognizing a
function simply increments a counter. The last rule illustrates several points:
The braces specify an action sequence that extends over several lines.
Its action uses the lex array yytext[], which stores the recognized character string.
Its specification uses the to indicate that zero or more letters may follow the G.
Some special features
Besides storing the matched input text in yytext[], the scanner automatically counts the number of characters
in a match and stores it in the variable yyleng. You may use this variable to refer to any specific character just
placed in the array yytext[]. Remember that C language array indexes start with 0, so to print out the third
digit (if there is one) in a just recognized integer, you might enter
[19]+

{if (yyleng > 2)


printf("%c", yytext[2]); }

lex follows a number of highlevel rules to resolve ambiguities that may arise from the set of rules that you
write. In the following lexical analyzer example, the ``reserved word'' end could match the second rule as well
as the eighth, the one for identifiers:
begin
end
while
if
package
reverse
loop
[azAZ][azAZ09]*
[09]+
\+
\
>
>=

Advanced lex usage

return(BEGIN);
return(END);
return(WHILE);
return(IF);
return(PACKAGE);
return(REVERSE);
return(LOOP);
{ tokval = put_in_tabl();
return(IDENTIFIER); }
{ tokval = put_in_tabl();
return(INTEGER); }
{ tokval = PLUS;
return(ARITHOP); }
{ tokval = MINUS;
return(ARITHOP); }
{ tokval = GREATER;
return(RELOP); }
{ tokval = GREATEREQL;
return(RELOP); }

104

Lexical analysis with lex


lex follows the rule that, where there is a match with two or more rules in a specification, the first rule is the
one whose action will be executed. By placing the rule for end and the other reserved words before the rule
for identifiers, we ensure that our reserved words will be duly recognized.
Another potential problem arises from cases where one pattern you are searching for is the prefix of another.
For instance, the last two rules in the lexical analyzer example above are designed to recognize > and >=. If
the text has the string >= at one point, you might worry that the lexical analyzer would stop as soon as it
recognized the > character and execute the rule for >, rather than read the next character and execute the rule
for >=. lex follows the rule that it matches the longest character string possible and executes the rule for that.
Here the scanner would recognize the >= and act accordingly. As a further example, the rule would enable
you to distinguish + from ++ in a C program.
Still another potential problem exists when the analyzer must read characters beyond the string you are
seeking because you cannot be sure that you've in fact found it until you've read the additional characters.
These cases reveal the importance of trailing context. The classic example here is the DO statement in
FORTRAN. In the statement
DO 50 k = 1 , 20, 1

we cannot be sure that the first 1 is the initial value of the index k until we read the first comma. Until then,
we might have the assignment statement
DO50k = 1

(Remember that FORTRAN ignores all blanks.) The way to handle this is to use the slash, /, which signifies
that what follows is trailing context, something not to be stored in yytext[], because it is not part of the pattern
itself. So the rule to recognize the FORTRAN DO statement could be
DO/([ ] [09]+[ ] [azAZ09]+=[azAZ09]+,)
printf("found DO");
}

Different versions of FORTRAN have limits on the size of identifiers, here the index name. To simplify the
example, the rule accepts an index name of any length. See ``Start conditions'' for a discussion of lex`s similar
handling of prior context.
lex uses the ``$'' symbol as an operator to mark a special trailing context the end of a line. An example
would be a rule to ignore all blanks and tabs at
the end of a line:
[

\t]+$

which could also be written:


[

\t]+/\n

On the other hand, if you want to match a pattern only when it starts a line or a file, you can use the ^
operator. Suppose a textformatting program requires that you not start a line with a blank. You might want to
check input to the program with some such rule as
^[ ]

printf("error: remove leading blank");

Advanced lex usage

105

Lexical analysis with lex


Note the difference in meaning when the ^ operator appears inside the left bracket, as described in
``Operators''.
lex routines
Some of your action statements themselves may require your reading another character, putting one back to
be read again a moment later, or writing a character on an output device. lex supplies three macros to handle
these tasks input(), unput(c), and output(c), respectively. One way to ignore all characters between two
special characters, say between a pair of double quotation marks, would be to use input(), thus:
\"

while (input() != '"');

Upon finding the first double quotation mark, the scanner will simply continue reading all subsequent
characters so long as none is a double quotation mark, and not look for a match again until it finds a second
double quotation mark. (See the further examples of input() and unput(c) usage in ``User routines''.)
By default, these routines are provided as macro definitions. To handle special I/O needs, such as writing to
several files, you may use standard I/O routines in C to rewrite the functions. Note, however, that they must
be modified consistently. In particular, the character set used must be consistent in all routines, and a value of
0 returned by input() must mean end of file. The relationship between input() and unput(c) must be
maintained or the lex lookahead will not work.
If you do provide your own input(), output(c), or unput(c), you will have to write a #undef input and so on
in your definitions section first:
#undef input
#undef output
.
.
.
#define input() . . . etc.
more declarations
.
.
.

Your new routines will replace the standard ones. See ``Definitions'' for further details.
A lex library routine that you may sometimes want to redefine is yywrap(), which is called whenever the
scanner reaches end of file. If yywrap() returns 1, the scanner continues with normal wrapup on end of input.
Occasionally, however, you may want to arrange for more input to arrive from a new source. In that case,
redefine yywrap() to return 0 whenever further processing is required. The default yywrap() always returns 1.
Note that it is not possible to write a normal rule that recognizes end of file; the only access to that condition
is through yywrap(). Unless a private version of input() is supplied, a file containing nulls cannot be handled
because a value of 0 returned by input() is taken to be end of file.
There are a number of lex routines that let you handle sequences of characters to be processed in more than
one way. These include yymore(), yyless(n), and REJECT. Recall that the text that matches a given
specification is stored in the array yytext[]. In general, once the action is performed for the specification, the
characters in yytext[] are overwritten with succeeding characters in the input stream to form the next match.
The function yymore(), by contrast, ensures that the succeeding characters recognized are appended to those
already in yytext[]. This lets you do one thing and then another, when one string of characters is significant
and a longer one including the first is significant as well. Consider a language that defines a string as a set of
Advanced lex usage

106

Lexical analysis with lex


characters between double quotation marks and specifies that to include a double quotation mark in a string it
must be preceded by a backslash. The regular expression matching that is somewhat confusing, so it might be
preferable to write:
\"[^"]

{
if (yytext[yyleng1] == '\\')
yymore();
else
. . . normal processing
}

When faced with the string ``"abc\"def"'', the scanner will first match the characters "abc\, whereupon the call
to yymore() will cause the next part of the string "def to be tacked on the end. The double quotation mark
terminating the string should be picked up in the code labeled ``normal processing.''
The function yyless(n) lets you specify the number of matched characters on which an action is to be
performed: only the first n characters of the expression are retained in yytext[]. Subsequent processing
resumes at the nth + 1 character. Suppose you are again in the code deciphering business and the idea is
to work with only half the characters in a sequence that ends with a certain one, say upper or lowercase Z. The
code you want might be
[ayAY]+[Zz]

yyless(yyleng/2);
. . . process first half of string . . . }

Finally, the function REJECT lets you more easily process strings of characters even when they overlap or
contain one another as parts. REJECT does this by immediately jumping to the next rule and its specification
without changing the contents of yytext[]. If you want to count the number of occurrences both of the regular
expression snapdragon and of its subexpression dragon in an input text, the following will do:
snapdragon
dragon

{countflowers++; REJECT;}
countmonsters++;

As an example of one pattern overlapping another, the following counts the number of occurrences of the
expressions comedian and diana, even where the input text has sequences such as comediana..:
comedian
diana

{comiccount++; REJECT;}
princesscount++;

Note that the actions here may be considerably more complicated than simply incrementing a counter. In all
cases, you declare the counters and other necessary variables in the definitions section commencing the lex
specification.
Definitions
The lex definitions section may contain any of several classes of items. The most critical are external
definitions, preprocessor statements like #include, and abbreviations. Recall that for valid lex source this
section is optional, but in most cases some of these items are necessary. Preprocessor statements and C source
code should appear between a line of the form %{ and one of the form %}. All lines between these delimiters
including those that begin with white space are copied to lex.yy.c immediately before the definition of
yylex(). (Lines in the definition section that are not enclosed by the delimiters are copied to the same place
provided they begin with white space.) The definitions section is where you would normally place C
definitions of objects accessed by actions in the rules section or by routines with external linkage.

Advanced lex usage

107

Lexical analysis with lex


One example occurs in using lex with yacc, which generates parsers that call a lexical analyzer. In this
context, you should include the file y.tab.h, which may contain #defines for token names:
%{
#include "y.tab.h"
extern int tokval;
int lineno;
%}

After the %} that ends your #include's and declarations, you place your abbreviations for regular expressions
to be used in the rules section. The abbreviation appears on the left of the line and, separated by one or more
spaces, its definition or translation appears on the right. When you later use abbreviations in your rules, be
sure to enclose them within braces. Abbreviations avoid needless repetition in writing your specifications and
make them easier to read.
As an example, reconsider the lex source reviewed at the beginning of this section on advanced lex usage. The
use of definitions simplifies our later reference to digits, letters, and blanks. This is especially true if the
specifications appear several times:
D
L
B
%%
{D}+
\+?{D}+
0.{D}+
G{L}*
rail{B}road
crook
.
.

[09]
[azAZ]
[ \t]+
printf("negative integer");
printf("positive integer");
printf("negative fraction");
printf("may have a G word here");
printf("railroad is one word");
printf("criminal");
.
.

Start conditions
Some problems require for their solution a greater sensitivity to prior context than is afforded by the ^
operator alone. You may want different rules to be applied to an expression depending on a prior context that
is more complex than the end of a line or the start of a file. In this situation you could set a flag to mark the
change in context that is the condition for the application of a rule, then write code to test the flag.
Alternatively, you could define for lex the different ``start conditions'' under which it is to apply each rule.
Consider this problem: copy the input to the output, except change the word magic to the word first on every
line that begins with the letter a; change magic to second on every line that begins with b; change magic to
third on every line that begins with c. Here is how the problem might be handled with a flag. Recall that
ECHO is a lex macro equivalent to printf("%s", yytext):
int flag
%%
^a {flag =
^b {flag =
^c {flag =
\n {flag =
magic {
switch
{

'a'; ECHO;}
'b'; ECHO;}
'c'; ECHO;}
0; ECHO;}
(flag)
case 'a': printf("first"); break;
case 'b': printf("second"); break;

Advanced lex usage

108

Lexical analysis with lex


case 'c': printf("third"); break;
default: ECHO; break;
}
}

To handle the same problem with start conditions, each start condition must be introduced to lex in the
definitions section with a line reading
%Start name1 name2 . . .

where the conditions may be named in any order. The word Start may be abbreviated to ``S'' or ``s''. The
conditions are referenced at the head of a rule with ``<>'' brackets. So
<name1>expression

is a rule that is only recognized when the scanner is in start condition name1. To enter a start condition,
execute the action statement
BEGIN name1;

which changes the start condition to name1. To resume the normal state
BEGIN 0;

resets the initial condition of the scanner. A rule may be active in several start conditions. That is,
<name1,name2,name3>

is a valid prefix. Any rule not beginning with the <> prefix operators is always active.
The example can be written with start conditions as follows:
%Start AA
%%
^a
^b
^c
\n
<AA>magic
<BB>magic
<CC>magic

BB CC
{ECHO; BEGIN AA;}
{ECHO; BEGIN BB;}
{ECHO; BEGIN CC;}
{ECHO; BEGIN 0;}
printf("first");
printf("second");
printf("third");

User routines
You may want to use your own routines in lex for much the same reason that you do so in other programming
languages. Action code that is to be used for several rules can be written once and called when needed. As
with definitions, this can simplify the writing and reading of programs. The function put_in_tabl(), to be
discussed in the next section on lex and yacc, is a good candidate for the user routines section of a lex
specification.
Another reason to place a routine in this section is to highlight some code of interest or to simplify the rules
section, even if the code is to be used for one rule only. As an example, consider the following routine to
ignore comments in a language like C where comments occur between / and /:
Advanced lex usage

109

Lexical analysis with lex


%{
static skipcmnts();
%}
%%
"/ "
skipcmnts();
.
.
/* rest of rules */
%%
static
skipcmnts()
{
for(;;)
{
while (input() != ' ')
;
if (input() != '/')
unput(yytext[yyleng1])
else return;
}
}

There are three points of interest in this example. First, the unput(c) macro (putting back the last character
read) is necessary to avoid missing the final / if the comment ends with a /. In this case, eventually having
read a , the scanner finds that the next character is not the terminal / and must read some more. Second, the
expression yytext[yyleng1] picks out that last character read. Third, this routine assumes that the comments
are not nested, which is indeed the case with the C language.

Using lex with yacc


If you work on a compiler project or develop a program to check the validity of an input language, you may
want to use the SCO OpenServer system tool yacc (see ``Parsing with yacc''). yacc generates parsers,
programs that analyze input to insure that it is syntactically correct. lex often forms a fruitful union with yacc
in the compiler development context. Whether or not you plan to use lex with yacc, be sure to read this
section because it covers information of interest to all lex programmers.
As noted, a program uses the lexgenerated scanner by repeatedly calling the function yylex(). This name is
used because a yaccgenerated parser calls its lexical analyzer with this very name. To use lex to create the
lexical analyzer for a compiler, you want to end each lex action with the statement return token, where token
is a defined term whose value is an integer. The integer value of the token returned indicates to the parser
what the lexical analyzer has found. The parser, called yyparse() by yacc, then resumes control and makes
another call to the lexical analyzer when it needs another token.
In a compiler, the different values of the token indicate what, if any, reserved word of the language has been
found or whether an identifier, constant, arithmetic operator, or relational operator has been found. In the
latter cases, the analyzer must also specify the exact value of the token: what the identifier is, whether the
constant is, say, 9 or 888, whether the operator is + or , and whether the relational operator is ``='' or ``>''.
Consider the following portion of lex source (discussed in another context earlier) for a scanner that
recognizes tokens in a "Clike" language:
begin
end
while
if
package
reverse

Using lex with yacc

return(BEGIN);
return(END);
return(WHILE);
return(IF);
return(PACKAGE);
return(REVERSE);

110

Lexical analysis with lex


loop
[azAZ][azAZ09]
[09]+
\+
\
>
>=

return(LOOP);
{ tokval = put_in_tabl();
return(IDENTIFIER); }
{ tokval = put_in_tabl();
return(INTEGER); }
{ tokval = PLUS;
return(ARITHOP); }
{ tokval = MINUS;
return(ARITHOP); }
{ tokval = GREATER;
return(RELOP); }
{ tokval = GREATEREQL;
return(RELOP); }

Despite appearances, the tokens returned, and the values assigned to tokval, are indeed integers. Good
programming style dictates that we use informative terms such as BEGIN, END, WHILE, and so forth to
signify the integers the parser understands, rather than use the integers themselves. You establish the
association by using #define statements in your parser calling routine in C. For example,
#define BEGIN 1
#define END 2
.
#define PLUS 7
.

If the need arises to change the integer for some token type, you then change the #define statement in the
parser rather than hunt through the entire program changing every occurrence of the particular integer. In
using yacc to generate your parser, insert the statement
#include "y.tab.h"

in the definitions section of your lex source. The file y.tab.h, which is created when yacc is invoked with the
d option, provides #define statements that associate token names such as BEGIN, END, and so on with the
integers of significance to the generated parser.
To indicate the reserved words in the example, the returned integer values suffice. For the other token types,
the integer value of the token type is stored in the programmerdefined variable tokval. This variable, whose
definition was an example in the definitions section, is globally defined so that the parser as well as the lexical
analyzer can access it. yacc provides the variable
yylval for the same purpose.
Note that the example shows two ways to assign a value to tokval. First, a function put_in_tabl() places the
name and type of the identifier or constant in a symbol table so that the compiler can refer to it in this or a
later stage of the compilation process. More to the present point, put_in_tabl() assigns a type value to tokval
so that the parser can use the information immediately to determine the syntactic correctness of the input text.
The function put_in_tabl() would be a routine that the compiler writer might place in the user routines
section of the parser. Second, in the last few actions of the example, tokval is assigned a specific integer
indicating which arithmetic or relational operator the scanner recognized. If the variable PLUS, for instance,
is associated with the integer 7 by means of the #define statement above, then when a + is recognized, the
action assigns to tokval the value 7, which indicates the +. The scanner indicates the general class of operator
by the value it returns to the parser (that is, the integer signified by ARITHOP or RELOP).
In using lex with yacc, either may be run first. The command

Using lex with yacc

111

Lexical analysis with lex


$ yacc d grammar.y

generates a parser in the file y.tab.c. As noted, the d option creates the file y.tab.h, which contains the
#define statements that associate the yaccassigned integer token values with the userdefined token names.
Now you can invoke lex with the command
$ lex lex.l

then compile and link the output files with the command
$ cc lex.yy.c y.tab.c ly ll

Note that the yacc library is loaded (via ly) before the lex library (via ll) to ensure that the supplied main()
will call the yacc parser.

Miscellaneous
Recognition of expressions in an input text is performed by a deterministic finite automaton generated by lex.
The v option prints out for you a small set of statistics describing the finite automaton. (For a detailed
account of finite automata and their importance for lex, see the Aho, Sethi, and Ullman text, Compilers:
Principles, Techniques, and Tools, AddisonWesley, 1986.)
lex uses a table to represent its finite automaton. The maximum number of states that the finite automaton
allows is set by default to 500. If your lex source has a large number of rules or the rules are very complex,
this default value may be too small. You can enlarge the value by placing another entry in the definitions
section of your lex source as follows:
%n 700

This entry tells lex to make the table large enough to handle as many as 700 states. (The v option will
indicate how large a number you should choose.) If you have need to increase the maximum number of state
transitions beyond 2000, the designated parameter is a, thus:
%a 2800

Summary of source format


The general form of a lex source file is
definitions
%%
rules
%%
user routines
The definitions section contains any combination of
definitions of abbreviations in the form
name space translation
included code in the form
Miscellaneous

112

Lexical analysis with lex


%{
C code
%}
start conditions in the form
Start name1 name2 . . .

changes to internal array sizes in the form


%x nnn

where nnn is a decimal integer representing an array size and x selects the parameter as
follows:
p positions
n states
e tree nodes
a transitions
k packed character classes
o output array size
Lines in the rules section have the form
expression action

where the action may be continued on succeeding lines by using braces to delimit it.
The lex operator characters are
" \ [] ^ ? .

| () $ / {} <> +

Important lex variables, functions, and macros are


yytext[]
yyleng
yylex()
yywrap()
yymore()
yyless(n)
REJECT
ECHO
input()
unput(c)
output(c)

Miscellaneous

array of
char
int
function
function
function
function
macro
macro
macro
macro
macro

113

Parsing with yacc


yacc provides a general tool for imposing structure on the input to a computer program. When you use yacc,
you prepare a specification that includes
a set of rules to describe the elements of the input
code to be invoked when a rule is recognized
either a definition or declaration of a lowlevel scanner to examine the input
yacc then turns the specification into a C language function that examines the input stream. This function,
called a parser, works by calling the lowlevel scanner. The scanner, called a lexical analyzer, picks up items
from the input stream. The selected items are known as tokens. Tokens are compared to the input construct
rules, called grammar rules. When one of the rules is recognized, the code you have supplied for the rule is
invoked. This code is called an action. Actions are fragments of C language code. They can return values and
make use of values returned by other actions.
The heart of the yacc specification is the collection of grammar rules. Each rule describes a construct and
gives it a name. For example, one grammar rule might be
date

month_name

day

year

where date, month_name, day, and year represent constructs of interest; presumably, month_name, day,
and year are defined in greater detail elsewhere. In the example, the comma is enclosed in single quotes. This
means that the comma is to appear literally in the input. The colon and semicolon merely serve as punctuation
in the rule and have no significance in evaluating the input. With proper definitions, the input
July

4, 1776

might be matched by the rule.


The lexical analyzer is an important part of the parsing function. This usersupplied routine reads the input
stream, recognizes the lowerlevel constructs, and communicates these as tokens to the parser. The lexical
analyzer recognizes constructs of the input stream as terminal symbols; the parser recognizes constructs as
nonterminal symbols. To avoid confusion, we will refer to terminal symbols as tokens.
There is considerable leeway in deciding whether to recognize constructs using the lexical analyzer or
grammar rules. For example, the rules
month_name : 'J' 'a' 'n'
month_name : 'F' 'e' 'b'
. . .
month_name : 'D' 'e' 'c'

;
;
;

might be used in the above example. While the lexical analyzer only needs to recognize individual letters,
such lowlevel rules tend to waste time and space, and may complicate the specification beyond the ability of
yacc to deal with it. Usually, the lexical analyzer recognizes the month names and returns an indication that a
month_name is seen. In this case, month_name is a token and the detailed rules are not needed.
Literal characters such as a comma must also be passed through the lexical analyzer and are also considered
tokens.

Parsing with yacc

114

Parsing with yacc


Specification files are very flexible. It is relatively easy to add to the above example the rule
date

month '/' day '/' year

allowing
7/4/1776

as a synonym for
July 4, 1776

on input. In most cases, this new rule could be slipped into a working system with minimal effort and little
danger of disrupting existing input.
The input being read may not conform to the specifications. With a lefttoright scan, input errors are
detected as early as is theoretically possible. Thus, not only is the chance of reading and computing with bad
input data substantially reduced, but the bad data usually can be found quickly. Error handling, provided as
part of the input specifications, permits the reentry of bad data or the continuation of the input process after
skipping over the bad data.
In some cases, yacc fails to produce a parser when given a set of specifications. For example, the
specifications may be selfcontradictory, or they may require a more powerful recognition mechanism than
that available to yacc. The former cases represent design errors; the latter cases often can be corrected by
making the lexical analyzer more powerful or by rewriting some of the grammar rules. While yacc cannot
handle all possible specifications, its power compares favorably with similar systems. Moreover, the
constructs that are difficult for yacc to handle are also frequently difficult for human beings to handle. Some
users have reported
that the discipline of formulating valid yacc specifications for their input revealed errors of conception or
design early in program development.
The remainder of this topic describes the following subjects:
basic process of preparing a yacc specification
parser operation
handling ambiguities
handling operator precedences in arithmetic expressions
error detection and recovery
the operating environment and special features of the parsers yacc produces
suggestions to improve the style and efficiency of the specifications
advanced topics
In addition, there are two examples and a summary of the yacc input syntax.

Basic specifications
Names refer to either tokens or nonterminal symbols. yacc requires token names to be declared as such. While
the lexical analyzer may be included as part of the specification file, it is perhaps more in keeping with
modular design to keep it as a separate file. Like the lexical analyzer, other subroutines may be included as
well. Thus, every specification file theoretically consists of three sections: the declarations, (grammar) rules,
and subroutines. The sections are separated by double percent signs (%%; the percent sign is generally used
Basic specifications

115

Parsing with yacc


in yacc specifications as an escape character).
A full specification file looks like
declarations
%%
rules
%%
subroutines

when all sections are used. The declarations and subroutines sections are optional. The smallest valid yacc
specification might be
%%
S:;

Blanks, tabs, and newlines are ignored, but they may not appear in names or multicharacter reserved
symbols. Comments may appear wherever a name is valid. They are enclosed in / and /, as in the C
language.
The rules section is made up of one or more grammar rules. A grammar rule has the form
A

BODY

where A represents a nonterminal symbol, and BODY represents a sequence of zero or more names and
literals. The colon and the semicolon are yacc punctuation.
Names may be of any length and may be made up of letters, periods, underscores, and digits although a digit
may not be the first character of a name. Uppercase and lowercase letters are distinct. The names used in the
body of a grammar rule may represent tokens or nonterminal symbols.
A literal consists of a character enclosed in single quotes. As in the C language, the backslash is an escape
character within literals. yacc recognizes all the C language escape sequences. For a number of technical
reasons, the null character should never be used in grammar rules.
If there are several grammar rules with the same lefthand side, the vertical bar can be used to avoid rewriting
the lefthand side. In addition, the semicolon at the end of a rule is dropped before a vertical bar. Thus the
grammar rules
A
A
A

:
:
:

B
E
G

C
F

D
;

can be given to yacc as


A

:
|
|
;

B
E
G

C
F

by using the vertical bar. It is not necessary that all grammar rules with the same left side appear together in
the grammar rules section although it makes the input more readable and easier to change.

Basic specifications

116

Parsing with yacc


If a nonterminal symbol matches the empty string, this can be indicated by
epsilon :

The blank space following the colon is understood by yacc to be a nonterminal symbol named ``epsilon''.
Names representing tokens must be declared. This is most simply done by writing
%token

name1

name2

name3

and so on in the declarations section. Every name not defined in the declarations section is assumed to
represent a nonterminal symbol. Every nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, the start symbol has particular importance. By default, the symbol is taken to
be the lefthand side of the first grammar rule in the rules section. It is possible and desirable to declare the
start symbol explicitly in the declarations section using the %start keyword:
%start

symbol

The end of the input to the parser is signaled by a special token, called the endmarker. The endmarker is
represented by either a zero or a negative number. If the tokens up to but not including the endmarker form a
construct that matches the start symbol, the parser function returns to its caller after the endmarker is seen
and accepts the input. If the endmarker is seen in any other context, it is an error.
It is the job of the usersupplied lexical analyzer to return the endmarker when appropriate. Usually the
endmarker represents some reasonably obvious I/O status, such as end of file or end of record.

Actions
With each grammar rule, you can associate actions to be performed when the rule is recognized. Actions may
return values and may obtain the values returned by previous actions. Moreover, the lexical analyzer can
return values for tokens if desired.
An action is an arbitrary C language statement and as such can do input and output, call subroutines, and alter
arrays and variables. An action is specified by one or more statements enclosed in { and }. For example,
A

:
{

'('

')'

hello( 1, "abc" );
}

and
XXX

:
{

YYY

ZZZ

(void) printf("a message\n");


flag = 25;
}

are grammar rules with actions.


The $ symbol is used to facilitate communication between the actions and the parser, The pseudovariable $$
represents the value returned by the complete action. For example, the action
Actions

117

Parsing with yacc


{

$$ = 1;

returns the value of one; in fact, that is all it does.


To obtain the values returned by previous actions and the lexical analyzer, the action can use the
pseudovariables $1, $2, . . . $n. These refer to the values returned by components 1 through n of the right
side of a rule, with the components being numbered from left to right. If the rule is
A

then $2 has the value returned by C, and $3 the value returned by D. The rule
expr

'('

expr

')'

provides a common example. One would expect the value returned by this rule to be the value of the expr
within the parentheses. Since the first component of the action is the literal left parenthesis, the desired logical
result can be indicated by
expr

:
{

'('

expr

')'

$$ = $2 ;
}

By default, the value of a rule is the value of the first element in it ($1). Thus, grammar rules of the form
A

frequently need not have an explicit action. In previous examples, all the actions came at the end of rules.
Sometimes, it is desirable to get control before a rule is fully parsed. yacc permits an action to be written in
the middle of a rule as well as at the end. This action is assumed to return a value accessible through the usual
$ mechanism by the actions to the right of it. In turn, it may access the values returned by the symbols to its
left. Thus, in the rule below the effect is to set x to 1 and y to the value returned by C:
A

B
{
$$ = 1;
}
C

{
x = $2;
y = $3;
}
;

Actions that do not terminate a rule are handled by yacc by manufacturing a new nonterminal symbol name
and a new rule matching this name to the empty string. The interior action is the action triggered by
recognizing this added rule. yacc treats the above example as if it had been written
$ACT

:
{

empty

$$ = 1;
}
;

A : B $ACT C { x = $2; y = $3; } ; where $ACT is an empty action.


Actions

118

Parsing with yacc


In many applications, output is not done directly by the actions. A data structure, such as a parse tree, is
constructed in memory and transformations are applied to it before output is generated. Parse trees are
particularly easy to construct given routines to build and maintain the tree structure desired. For example,
suppose there is a C function node written so that the call
node( L, n1, n2 )

creates a node with label L and descendants n1 and n2 and returns the index of the newly created node. Then
a parse tree can be built by supplying actions such as
expr

:
{

expr

'+'

expr

$$ = node( '+', $1, $3 );


}

in the specification.
You may define other variables to be used by the actions. Declarations and definitions can appear in the
declarations section enclosed in %{ and %}. These declarations and definitions have global scope, so they are
known to the action statements and can be made known to the lexical analyzer. For example:
%{

int variable = 0;

%}

could be placed in the declarations section making variable accessible to all of the actions. You should avoid
names beginning with yy because the yacc parser uses only such names. Note, too, that in the examples
shown thus far all the values are integers. A discussion of values of other types may be found in ``Advanced
topics''. Finally, note that in the following case
%{
int i;
printf("%}");
%}

yacc will start copying after %{ and stop copying when it encounters the first %}, the one in printf(). In
contrast, it would copy %{ in printf() if it encountered it there.

Lexical analysis
You must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired)
to the parser. The lexical analyzer is an integervalued function called yylex(). The function returns an
integer, the token number, representing the kind of token read. If there is a value associated with that token, it
should be assigned to the external variable yylval.
The parser and the lexical analyzer must agree on these token numbers in order for communication between
them to take place. The numbers may be chosen by yacc or the user. In either case, the #define mechanism of
C language is used to allow the lexical analyzer to return these numbers symbolically. For example, suppose
that the token name DIGIT has been defined in the declarations section of the yacc specification file. The
relevant portion of the lexical analyzer might look like
int yylex()
{
extern int yylval;
int c;

Lexical analysis

119

Parsing with yacc


...
c = getchar();
...
switch (c)
{
...
case '0':
case '1':
...
case '9':
yylval = c '0';
return (DIGIT);
...
}
...
}

to return the appropriate token.


The intent is to return a token number of DIGIT and a value equal to the numerical value of the digit. You put
the lexical analyzer code in the subroutines section and the declaration for DIGIT in the declarations section.
Alternatively, you can put the lexical analyzer code in a separately compiled file, provided
you invoke yacc with the d option, which generates a file called y.tab.h that contains #define
statements for the tokens, and
you #include y.tab.h in the separately compiled lexical analyzer.
This mechanism leads to clear, easily modified lexical analyzers. The only pitfall to avoid is using any token
names in the grammar that are reserved or significant in C language or the parser. For example, the use of
token names if or while will almost certainly cause severe difficulties when the lexical analyzer is compiled.
The token name error is reserved for error handling and should not be used naively.
In the default situation, token numbers are chosen by yacc. The default token number for a literal character is
the numerical value of the character in the local character set. Other names are assigned token numbers
starting at 257.
If you prefer to assign the token numbers, the first appearance of the token name or literal in the declarations
section must be followed immediately by a nonnegative integer. This integer is taken to be the token number
of the name or literal. Names and literals not defined this way are assigned default definitions by yacc. The
potential for duplication exists here. Care must be taken to make sure that all token numbers are distinct.
For historical reasons, the endmarker must have token number 0 or be negative. You cannot redefine this
token number. Thus, all lexical analyzers should be prepared to return 0 or a negative number as a token upon
reaching the end of their input.
As noted in ``Lexical analysis with lex'', lexical analyzers produced by lex are designed to work in close
harmony with yacc parsers. The specifications for these lexical analyzers use regular expressions instead of
grammar rules. lex can be used to produce quite complicated lexical analyzers, but there remain some
languages that do not fit any theoretical framework and whose lexical analyzers must be crafted by hand.

Parser operation

Parser operation

120

Parsing with yacc


yacc turns the specification file into a C language procedure, which parses the input according to the
specification given. The algorithm used to go from the specification to the parser is complex and will not be
discussed here. The parser itself, though, is relatively simple and understanding its usage will make treatment
of error recovery and ambiguities easier.
The parser produced by yacc consists of a finite state machine with a stack. The parser is also capable of
reading and remembering the next input token, called the lookahead token. The current state is always the one
on the top of the stack. The states of the finite state machine are given small integer labels. Initially, the
machine is in state 0 (the stack contains only state 0) and no lookahead token has been read.
The machine has only four actions available: shift, reduce, accept, and error. A step of the parser is done as
follows:
1. Based on its current state, the parser decides if it needs a lookahead token to choose the action to be
taken. If it needs one and does not have one, it calls yylex() to obtain the next token.
2. Using the current state and the lookahead token if needed, the parser decides on its next action and
carries it out. This may result in states being pushed onto the stack or popped off of the stack and in
the lookahead token being processed or left alone.
The shift action is the most common action the parser takes. Whenever a shift action is taken, there is always
a lookahead token. For example, in state 56 there may be an action
IF

shift 34

which says, in state 56, if the lookahead token is IF, the current state (56) is pushed down on the stack, and
state 34 becomes the current state (on the top of the stack). The lookahead token is cleared.
The reduce action keeps the stack from growing without bounds. reduce actions are appropriate when the
parser has seen the righthand side of a grammar rule and is prepared to announce that it has seen an instance
of the rule replacing the righthand side by the lefthand side. It may be necessary to consult the lookahead
token to decide whether or not to reduce. In fact, the default action (represented by .) is often a reduce action.
reduce actions are associated with individual grammar rules. Grammar rules are also given small integer
numbers, and this leads to some confusion. The action
.

reduce 18

refers to grammar rule 18, while the action


IF

shift 34

refers to state 34.


Suppose the rule
A

is being reduced. The reduce action depends on the lefthand symbol (A in this case) and the number of
symbols on the righthand side (three in this case). To reduce, first pop off the top three states from the stack.
(In general, the number of states popped equals the number of symbols on the right side of the rule.) In effect,
these states were the ones put on the stack while recognizing x, y, and z and no longer serve any useful
Parser operation

121

Parsing with yacc


purpose. After popping these states, a state is uncovered, which was the state the parser was in before
beginning to process the rule. Using this uncovered state and the symbol on the left side of the rule, perform
what is in effect a shift of A. A new state is obtained, pushed onto the stack, and parsing continues. There are
significant differences between the processing of the lefthand symbol and an ordinary shift of a token,
however, so this action is called a goto action. In particular, the lookahead token is cleared by a shift but is not
affected by a goto. In any case, the uncovered state contains an entry such as
A

goto 20

causing state 20 to be pushed onto the stack and become the current state.
In effect, the reduce action turns back the clock in the parse, popping the states off the stack to go back to the
state where the righthand side of the rule was first seen. The parser then behaves as if it had seen the left side
at that time. If the righthand side of the rule is empty, no states are popped off the stacks. The uncovered
state is in fact the current state.
The reduce action is also important in the treatment of usersupplied actions and values. When a rule is
reduced, the code supplied with the rule is executed before the stack is adjusted. In addition to the stack
holding the states, another stack running in parallel with it holds the values returned from the lexical analyzer
and the actions. When a shift takes place, the external variable yylval is copied onto the value stack. After the
return from the user code, the reduction is carried out. When the goto action is done, the external variable
yyval is copied onto the value stack. The pseudovariables $1, $2, and so on refer to the value stack.
The other two parser actions are conceptually much simpler. The accept action indicates that the entire input
has been seen and that it matches the specification. This action appears only when the lookahead token is the
endmarker and indicates that the parser has successfully done its job. The error action, on the other hand,
represents a place where the parser can no longer continue parsing according to the specification. The input
tokens it has seen (together with the lookahead token) cannot be followed by anything that would result in a
valid input. The parser reports an error and attempts to recover the situation and resume parsing. The error
recovery (as opposed to the detection of error) will be discussed later.
Consider
%token
%%
rhyme
sound
place

DING
:
;
:
;
:
;

DONG

sound
DING

DELL
place

DONG

DELL

as a yacc specification. When yacc is invoked with the v (verbose) option, a file called y.output is produced
with a humanreadable description of the parser. The y.output file corresponding to the above grammar (with
some statistics stripped off the end) follows.
state 0
$accept

_rhyme

$end

DING shift 3 . error


rhyme goto 1 sound goto 2

Parser operation

122

Parsing with yacc


state 1 $accept : rhyme_$end
$end accept . error
state 2 rhyme : sound_place
DELL shift 5 . error
place goto 4
state 3 sound : DING_DONG
DONG shift 6 . error
state 4 rhyme : sound place_ (1)
. reduce 1
state 5 place : DELL_ (3)
. reduce 3
state 6 sound : DING DONG_ (2)
. reduce 2 The actions for each state are specified and there is a description of the parsing rules being
processed in each state. The _ character is used to indicate what has been seen and what is yet to come in each
rule. The following input
DING

DONG

DELL

can be used to track the operations of the parser. Initially, the current state is state 0. The parser needs to refer
to the input in order to decide between the actions available in state 0, so the first token, DING, is read and
becomes the lookahead token. The action in state 0 on DING is shift 3, state 3 is pushed onto the stack, and
the lookahead token is cleared. State 3 becomes the current state. The next token, DONG, is read and
becomes the lookahead token. The action in state 3 on the token DONG is shift 6, state 6 is pushed onto the
stack, and the lookahead is cleared. The stack now contains 0, 3, and 6. In state 6, without even consulting the
lookahead, the parser reduces by
sound

DING

DONG

which is rule 2. Two states, 6 and 3, are popped off the stack, uncovering state 0. Consulting the description of
state 0 (looking for a goto on sound),
sound

goto 2

is obtained. State 2 is pushed onto the stack and becomes the current state.
In state 2, the next token, DELL, must be read. The action is shift 5, so state 5 is pushed onto the stack, which
now has 0, 2, and 5 on it, and the lookahead token is cleared. In state 5, the only action is to reduce by rule 3.
This has one symbol on the righthand side, so one state, 5, is popped off, and state 2 is uncovered. The goto
in state 2 on place (the left side of rule 3) is state 4. Now, the stack contains 0, 2, and 4. In state 4, the only
Parser operation

123

Parsing with yacc


action is to reduce by rule 1. There are two symbols on the right, so the top two states are popped off,
uncovering state 0 again. In state 0, there is a goto on rhyme causing the parser to enter state 1. In state 1, the
input is read and the endmarker is obtained indicated by $end in the y.output file. The action in state 1 (when
the endmarker is seen) successfully ends the parse.
You might want to consider how the parser works when confronted with such incorrect strings as DING
DONG DONG, DING DONG, DING DONG DELL DELL, and so on. A few minutes spent with this and
other simple examples is repaid when problems arise in more complicated contexts.

Ambiguity and conflicts


A set of grammar rules is ambiguous if there is some input string that can be structured in two or more
different ways. For example, the grammar rule
expr

expr

''

expr

is a natural way of expressing the fact that one way of forming an arithmetic expression is to put two other
expressions together with a minus sign between them. Unfortunately, this grammar rule does not completely
specify the way that all complex inputs should be structured. For example, if the input is
expr

expr

expr

the rule allows this input to be structured as either


(

expr

expr

expr

expr

expr

or as
expr

The first is called left association, the second right association.


yacc detects such ambiguities when it is attempting to build the parser. Given the input
expr

expr

expr

consider the problem that confronts the parser. When the parser has read the second expr, the input seen
expr

expr

matches the right side of the grammar rule above. The parser could reduce the input by applying this rule.
After applying the rule, the input is reduced to expr (the left side of the rule). The parser would then read the
final part of the input

expr

and again reduce. The effect of this is to take the left associative interpretation.
Alternatively, if the parser sees
expr

expr

Ambiguity and conflicts

124

Parsing with yacc


it could defer the immediate application of the rule and continue reading the input until
expr

expr

expr

is seen. It could then apply the rule to the rightmost three symbols, reducing them to expr, which results in
expr

expr

being left. Now the rule can be reduced once more. The effect is to take the right associative interpretation.
Thus, having read
expr

expr

the parser can do one of two valid things, shift or reduce. It has no way of deciding between them. This is
called a shiftreduce conflict. It may also happen that the parser has a choice of two valid reductions. This is
called a reducereduce conflict. Note that there are never any shiftshift conflicts.
When there are shiftreduce or reducereduce conflicts, yacc still produces a parser. It does this by
selecting one of the valid steps wherever it has a choice. A rule describing the choice to make in a given
situation is called a disambiguating rule.
yacc invokes two default disambiguating rules:
1. In a shiftreduce conflict, the default is to do the shift.
2. In a reducereduce conflict, the default is to reduce by the earlier grammar rule (in the yacc
specification).
Rule 1 implies that reductions are deferred in favor of shifts when there is a choice. Rule 2 gives the user
rather crude control over the behavior of the parser in this situation, but reducereduce conflicts should be
avoided when possible.
Conflicts may arise because of mistakes in input or logic or because the grammar rules (while consistent)
require a more complex parser than yacc can construct. The use of actions within rules can also cause
conflicts if the action must be done before the parser can be sure which rule is being recognized. In these
cases, the application of disambiguating rules is inappropriate and leads to an incorrect parser. For this reason,
yacc always reports the number of shiftreduce and reducereduce conflicts resolved by rules 1 and 2
above.
In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also possible
to rewrite the grammar rules so that the same inputs are read but there are no conflicts. For this reason, most
previous parser generators have considered conflicts to be fatal errors. Our experience has suggested that this
rewriting is somewhat unnatural and produces slower parsers. Thus, yacc will produce parsers even in the
presence of conflicts.
As an example of the power of disambiguating rules, consider
stat

:
|
;

IF
IF

'('
'('

cond
cond

')'
')'

stat
stat

ELSE

stat

which is a fragment from a programming language involving an ifthenelse statement. In these rules, IF and
ELSE are tokens, cond is a nonterminal symbol describing conditional (logical) expressions, and stat is a
Ambiguity and conflicts

125

Parsing with yacc


nonterminal symbol describing statements. The first rule will be called the simple if rule and the second the
ifelse rule.
These two rules form an ambiguous construction because input of the form
IF

C1

IF

C2

S1

ELSE

S2

can be structured according to these rules in two ways


IF
{

( C1 )
IF

( C2 )
S1

}
ELSE
S2

or
IF
{

( C1 )
IF

( C2 )
S1
ELSE
S2
}

where the second interpretation is the one given in most programming languages having this construct; each
ELSE is associated with the last preceding unELSE'd IF. In this example, consider the situation where the
parser has seen
IF

C1

IF

C2

S1

and is looking at the ELSE. It can immediately reduce by the simple if rule to get
IF

C1

stat

and then read the remaining input


ELSE

S2

and reduce
IF

C1

stat

ELSE

S2

by the ifelse rule. This leads to the first of the above groupings of the input.
On the other hand, the ELSE may be shifted, S2 read, and then the righthand portion of
IF

C1

IF

C2

S1

ELSE

S2

can be reduced by the ifelse rule to get


IF

C1

stat

Ambiguity and conflicts

126

Parsing with yacc


which can be reduced by the simple if rule. This leads to the second of the above groupings of the input,
which is usually the one desired.
Once again, the parser can do two valid things there is a shiftreduce conflict. The application of
disambiguating rule 1 tells the parser to shift in this case, which leads to the desired grouping.
This shiftreduce conflict arises only when there is a particular current input symbol, ELSE, and particular
inputs, such as
IF

C1

IF

C2

S1

have already been seen. In general, there may be many conflicts, and each one will be associated with an input
symbol and a set of previously read inputs. The previously read inputs are characterized by the state of the
parser.
The conflict messages of yacc are best understood by examining the v output. For example, the output
corresponding to the above conflict state might be
23: shiftreduce conflict (shift 45, reduce 18) on ELSE

state 23
stat : IF ( cond ) stat_ (18) stat : IF ( cond ) stat_ELSE stat
ELSE shift 45 . reduce 18 where the first line describes the conflict giving the state and the input symbol.
The ordinary state description gives the grammar rules active in the state and the parser actions. Recall that
the underscore marks the portion of the grammar rules that has been seen. Thus in the example, in state 23, the
parser has seen input corresponding to
IF

cond

stat

and the two grammar rules shown are active at this time. The parser can do two possible things. If the input
symbol is ELSE, it is possible to shift into state 45. State 45 will have, as part of its description, the line
stat

IF

cond

stat

ELSE_stat

because the ELSE will have been shifted in this state. In state 23, the alternative action (specified by .) is to be
done if the input symbol is not mentioned explicitly in the actions. In this case, if the input symbol is not
ELSE, the parser reduces to
stat

IF

'('

cond

')'

stat

by grammar rule 18.


Once again, notice that the numbers following shift commands refer to other states, while the numbers
following reduce commands refer to grammar rule numbers. In the y.output file, rule numbers are printed in
parentheses after those rules that can be reduced. In most states, there is a reduce action possible, and reduce
is the default command. If you encounter unexpected shiftreduce conflicts, you will probably want to look
at the v output to decide whether the default actions are appropriate.

Ambiguity and conflicts

127

Parsing with yacc

Precedence
There is one common situation where the rules given above for resolving conflicts are not sufficient. This is in
the parsing of arithmetic expressions. Most of the commonly used constructions for arithmetic expressions
can be naturally described by the notion of precedence levels for operators, together with information about
left or right associativity. It turns out that ambiguous grammars with appropriate disambiguating rules can be
used to create parsers that are faster and easier to write than parsers constructed from unambiguous grammars.
The basic notion is to write grammar rules of the form
expr

expr

expr

UNARY

OP

expr

and
expr

for all binary and unary operators desired. This creates a very ambiguous grammar with many parsing
conflicts. You specify as disambiguating rules the precedence or binding strength of all the operators and the
associativity of the binary operators. This information is sufficient to allow yacc to resolve the parsing
conflicts in accordance with these rules and construct a parser that realizes the desired precedences and
associativities.
The precedences and associativities are attached to tokens in the declarations section. This is done by a series
of lines beginning with the yacc keywords %left, %right, or %nonassoc, followed by a list of tokens. All of
the tokens on the same line are assumed to have the same precedence level and associativity; the lines are
listed in order of increasing precedence or binding strength. Thus
%left
%left

'+'
' '

''
'/'

describes the precedence and associativity of the four arithmetic operators. + and are left associative and
have lower precedence than and /, which are also left associative. The keyword %right is used to describe
right associative operators. The keyword %nonassoc is used to describe operators, like the operator .LT. in
FORTRAN, that may not associate with themselves. That is, because
A .LT. B .LT. C

is invalid in FORTRAN, .LT. would be described with the keyword %nonassoc in yacc.
As an example of the behavior of these declarations, the description
%right '='
%left '+' ''
%left ' ' '/'

%%
expr : expr '=' expr | expr '+' expr | expr '' expr | expr ' ' expr | expr '/' expr | NAME ; might be used to
structure the input
a

as follows
Precedence

128

Parsing with yacc


a = ( b = ( ((c

d) e) (f

g) ) )

in order to achieve the correct precedence of operators. When this mechanism is used, unary operators must,
in general, be given a precedence. Sometimes a unary operator and a binary operator have the same symbolic
representation but different precedences. An example is unary and binary minus.
Unary minus may be given the same strength as multiplication, or even higher, while binary minus has a
lower strength than multiplication. The keyword %prec changes the precedence level associated with a
particular grammar rule. %prec appears immediately after the body of the grammar rule, before the action or
closing semicolon, and is followed by a token name or literal. It causes the precedence of the grammar rule to
become that of the following token name or literal. For example, the rules
%left
%left

'+'
' '

''
'/'

%%
expr : expr '+' expr | expr '' expr | expr ' ' expr | expr '/' expr | '' expr %prec ' ' | NAME ; might be used to
give unary minus the same precedence as multiplication.
A token declared by %left, %right, and %nonassoc need not, but may, be declared by %token as well.
Precedences and associativities are used by yacc to resolve parsing conflicts. They give rise to the following
disambiguating rules:
1. Precedences and associativities are recorded for those tokens and literals that have them.
2. A precedence and associativity is associated with each grammar rule. It is the precedence and
associativity of the last token or literal in the body of the rule. If the %prec construction is used, it
overrides this default. Some grammar rules may have no precedence and associativity associated with
them.
3. When there is a reducereduce or shiftreduce conflict, and either the input symbol or the grammar
rule has no precedence and associativity, then the two default disambiguating rules given in the
preceding section are used, and the conflicts are reported.
4. If there is a shiftreduce conflict and both the grammar rule and the input character have precedence
and associativity associated with them, then the conflict is resolved in favor of the action shift or
reduce associated with the higher precedence. If precedences are equal, then associativity is used.
Left associative implies reduce; right associative implies shift; nonassociating implies error.
Conflicts resolved by precedence are not counted in the number of shiftreduce and reducereduce conflicts
reported by yacc. This means that mistakes in the specification of precedences may disguise errors in the input
grammar. It is a good idea to be sparing with precedences and use them in a cookbook fashion until some
experience has been gained. The y.output file is useful in deciding whether the parser is actually doing what
was intended.
To illustrate further how you might use the precedence keywords to resolve a shiftreduce conflict, we will
look at an example similar to the one described in the previous section. Consider the following C statement:
if (flag) if (anotherflag) x = 1;
else x = 2;

The problem for the parser is whether the else goes with the first or the second if. C programmers will
recognize that the else goes with the second if, contrary to to what the misleading indentation suggests. The
Precedence

129

Parsing with yacc


following yacc grammar for an ifthenelse construct abstracts the problem. That is, the input iises will
model the C statement shown above.
%{
#include <stdio.h>
%}
%token SIMPLE IF ELSE
%%
S
: stmnt '\n'
;
stmnt
: SIMPLE
| if_stmnt
;
if_stmnt
: IF stmnt
{ printf("simple if\n");}
| IF stmnt ELSE stmnt
{ printf("if_then_else\n");}
;
%%
int
yylex() {
int c;
c=getchar();
if (c==EOF) return 0;
else switch(c) {
case 'i': return IF;
case 's': return SIMPLE;
case 'e': return ELSE;
default: return c;
}
}

When the specification is passed to yacc, however, we get the following message:
conflicts: 1 shift/reduce

The problem is that when yacc has read iis in trying to match iises, it has two choices: recognize is as a
statement (reduce), or read some more input (shift) and eventually recognize ises as a statement.
One way to resolve the problem is to invent a new token REDUCE whose sole purpose is to give the correct
precedence for the rules:
%{
#include <stdio.h>
%}
%token SIMPLE IF
%nonassoc REDUCE
%nonassoc ELSE
%%
S
: stmnt '\n'
;
stmnt
: SIMPLE
| if_stmnt
;
if_stmnt
: IF stmnt %prec REDUCE
{ printf("simple if");}
| IF stmnt ELSE stmnt
{ printf("if_then_else");}
;

Precedence

130

Parsing with yacc


%%
...

Since the precedence associated with the second form of if_stmnt is higher now, yacc will try to match that
rule first, and no conflict will be reported.
Actually, in this simple case, the new token is not needed:
%nonassoc IF
%nonassoc ELSE

would also work. Moreover, it is not really necessary to resolve the conflict in this way, because, as we have
seen, yacc will shift by default in a shiftreduce conflict. Resolving conflicts is a good idea, though, in the
sense that you should not see diagnostic messages for correct specifications.

Error handling
Error handling is an extremely difficult area, and many of the problems are semantic ones. When an error is
found, for example, it may be necessary to reclaim parse tree storage, delete or alter symbol table entries,
and/or, typically, set switches to avoid generating any further output.
It is seldom acceptable to stop all processing when an error is found. It is more useful to continue scanning the
input to find further syntax errors. This leads to the problem of getting the parser restarted after an error. A
general class of algorithms to do this involves discarding a number of tokens from the input string and
attempting to adjust the parser so that input can continue.
To allow the user some control over this process, yacc provides the token name error. This name can be used
in grammar rules. In effect, it suggests where errors are expected and recovery might take place. The parser
pops its stack until it enters a state where the token error is valid. It then behaves as if the token error were
the current lookahead token and performs the action encountered. The lookahead token is then reset to the
token that caused the error. If no special error rules have been specified, the processing halts when an error is
detected.
In order to prevent a cascade of error messages, the parser, after detecting an error, remains in error state until
three tokens have been successfully read and shifted. If an error is detected when the parser is already in error
state, no message is given, and the input token is quietly deleted.
As an example, a rule of the form
stat

error

means that on a syntax error the parser attempts to skip over the statement in which the error is seen. More
precisely, the parser scans ahead, looking for three tokens that might validly follow a statement, and starts
processing at the first of these. If the beginnings of statements are not sufficiently distinctive, it may make a
false start in the middle of a statement and end up reporting a second error where there is in fact no error.
Actions may be used with these special error rules. These actions might attempt to reinitialize tables, reclaim
symbol table space, and so forth.
Error rules such as the above are very general but difficult to control. Rules such as

Error handling

131

Parsing with yacc


stat

error

';'

are somewhat easier. Here, when there is an error, the parser attempts to skip over the statement but does so
by skipping to the next semicolon. All tokens after the error and before the next semicolon cannot be shifted
and are discarded. When the semicolon is seen, this rule will be reduced and any cleanup action associated
with it performed.
Another form of error rule arises in interactive applications where it may be desirable to permit a line to be
reentered after an error. The following example
input : error

'\n'
{
(void) printf("Reenter last line: " );
}
input

{
$$ = $4;
}
;

is one way to do this. There is one potential difficulty with this approach. The parser must correctly process
three input tokens before it admits that it has correctly resynchronized after the error. If the reentered line
contains an error in the first two tokens, the parser deletes the offending tokens and gives no message. This is
clearly unacceptable. For this reason, there is a mechanism that can force the parser to believe that error
recovery has been accomplished. The statement
yyerrok ;

in an action resets the parser to its normal mode. The last example can be rewritten as
input : error

'\n'
{
yyerrok;
(void) printf("Reenter last line: " );
}
input

{
$$ = $4;
}
;

As previously mentioned, the token seen immediately after the error symbol is the input token at which the
error was discovered. Sometimes this is inappropriate; for example, an error recovery action might take upon
itself the job of finding
the correct place to resume input. In this case, the previous lookahead token must be cleared. The statement
yyclearin ;

in an action will have this effect. For example, suppose the action after error were to call some sophisticated
resynchronization routine (supplied by the user) that attempted to advance the input to the beginning of the
next valid statement. After this routine is called, the next token returned by yylex() is presumably the first
token in a valid statement. The old invalid token must be discarded and the error state reset. A rule similar to
stat

:
{

Error handling

error

132

Parsing with yacc


resynch();
yyerrok ;
yyclearin;
}
;

could perform this.


These mechanisms are admittedly crude but do allow for a simple, fairly effective recovery of the parser from
many errors. Moreover, the user can get control to deal with the error actions required by other portions of the
program.

The yacc environment


You create a yacc parser with the command
$ yacc grammar.y

where grammar.y is the file containing your yacc specification. (The .y suffix is a convention recognized by
other SCO OpenServer system commands. It is not strictly necessary.) The output is a file of C language
subroutines called y.tab.c. The function produced by yacc is called yyparse(), and is integervalued. When it
is called, it in turn repeatedly calls yylex(), the lexical analyzer supplied by the user (see ``Lexical analysis''),
to obtain input tokens. Eventually, an error is detected, yyparse() returns the value 1, and no error recovery is
possible, or the lexical analyzer returns the endmarker token and the parser accepts. In this case, yyparse()
returns the value 0.
You must provide a certain amount of environment for this parser in order to obtain a working program. For
example, as with every C language program, a routine called main() must be defined that eventually calls
yyparse(). In addition, a routine called yyerror() is needed to print a message when a syntax error is detected.
These two routines must be supplied in one form or another by the user. To ease the initial effort of using
yacc, a library has been provided with default versions of main() and yyerror(). The library, liby, is accessed
by a ly argument to the cc command. The source codes
main()
{
return (yyparse());
}

and
# include <stdio.h>

yyerror(s) char s; { (void) fprintf(stderr, "%s\n", s); } show the triviality of these default programs. The
argument to yyerror() is a string containing an error message, usually the string syntax error. The average
application wants to do better than this. Ordinarily, the program should keep track of the input line number
and print it along with the message when a syntax error is detected. The external integer variable yychar
contains the lookahead token number at the time the error was detected. This may be of some interest in
giving better diagnostics. Since the main() routine is probably supplied by the user (to read arguments, for
instance), the yacc library is useful only in small projects or in the earliest stages of larger ones.

The yacc environment

133

Parsing with yacc


The external integer variable yydebug is normally set to 0. If it is set to a nonzero value, the parser will output
a verbose description of its actions including a discussion of the input symbols read and what the parser
actions are. It is possible to set this variable by using debug(CP).

Hints for preparing specifications


The following sections contain miscellaneous hints on preparing efficient, easy to change, and clear
specifications. The individual subsections are more or less independent.

Input style
It is difficult to provide rules with substantial actions and still have a readable specification file. The following
are a few style hints.
1. Use all uppercase letters for token names and all lowercase letters for nonterminal names. This is
useful in debugging.
2. Put grammar rules and actions on separate lines. It makes editing easier.
3. Put all rules with the same lefthand side together. Put the lefthand side in only once and let all
following rules begin with a vertical bar.
4. Put a semicolon only after the last rule with a given lefthand side and put the semicolon on a
separate line. This allows new rules to be easily added.
5. Indent rule bodies by one tab stop and action bodies by two tab stops.
6. Put complicated actions into subroutines defined in separate files.
``A simple example'' is written following this style, as are the examples in this section (where space permits).
The central problem is to make the rules visible through the morass of action code.

Left recursion
The algorithm used by the yacc parser encourages so called left recursive grammar rules. Rules of the form
name

name

rest_of_rule

match this algorithm. Rules such as


list

:
|
;

item
list

:
|
;

item
seq item

','

item

and
seq

frequently arise when writing specifications of sequences and lists. In each of these cases, the first rule will be
reduced for the first item only; and the second rule will be reduced for the second and all succeeding items.
With right recursive rules, such as
seq

item

Hints for preparing specifications

134

Parsing with yacc


|
;

item

seq

the parser is a bit bigger; and the items are seen and reduced from right to left. More seriously, an internal
stack in the parser is in danger of overflowing if an extremely long sequence is read (although yacc can
process very large stacks). Thus, you should use left recursion wherever reasonable.
It is worth considering if a sequence with zero elements has any meaning, and if so, consider writing the
sequence specification as
seq

:
|
;

/ empty /
seq item

using an empty rule. Once again, the first rule would always be reduced exactly once before the first item was
read, and then the second rule would be reduced once for each item read. Permitting empty sequences often
leads to increased generality. However, conflicts might arise if yacc is asked to decide which empty sequence
it has seen when it has not seen enough to know!

Lexical tieins
Some lexical decisions depend on context. For example, the lexical analyzer might want to delete blanks
normally, but not within quoted strings, or names might be entered into a symbol table in declarations but not
in expressions. One way of handling these situations is to create a global flag that is examined by the lexical
analyzer and set by actions. For example,
%{
int dflag;
%}
...

other declarations ...

%%
prog : decls stats ;
decls : / empty / { dflag = 1; } | decls declaration ;
stats : / empty / { dflag = 0; } | stats statement ;
other rules specifies a program that consists of zero or more declarations followed by zero or more
statements. The flag dflag is now 0 when reading statements and 1 when reading declarations, except for the
first token in the first statement. This token must be seen by the parser before it can tell that the declaration
section has ended and the statements have begun. In many cases, this single token exception does not affect
the lexical scan.
This kind of backdoor approach can be elaborated to a noxious degree. Nevertheless, it represents a way of
doing some things that are difficult, if not impossible, to do otherwise.

Lexical tieins

135

Parsing with yacc

Reserved words
Some programming languages permit you to use words like if, which are normally reserved as label or
variable names, provided that such use does not conflict with the valid use of these names in the programming
language. This is extremely hard to do in the framework of yacc. It is difficult to pass information to the
lexical analyzer telling it this instance of if is a keyword and that instance is a variable. You can make a stab
at it using the mechanism described in the last subsection, but it is difficult.

Advanced topics
The following sections discuss a number of advanced features of yacc.

Simulating error and accept in actions


The parsing actions of error and accept can be simulated in an action by use of macros YYACCEPT and
YYERROR. The YYACCEPT macro causes yyparse() to return the value 0; YYERROR causes the parser
to behave as if the current input symbol had been a syntax error; yyerror() is called, and error recovery takes
place. These mechanisms can be used to simulate parsers with multiple endmarkers or context sensitive
syntax checking.

Accessing values in enclosing rules


An action may refer to values returned by actions to the left of the current rule. The mechanism is simply the
same as with ordinary actions, $ followed by a digit.
sent

:
{

adj

noun

verb

adj

noun

look at the sentence ...

adj

}
;
:
{

THE
$$ = THE;

}
|
{

noun

YOUNG

$$ = YOUNG;
}
...
;
:
DOG
{
$$ = DOG;
}
|
CRONE
{
if( $0 == YOUNG )
{
(void) printf( "what?\n" );
}
$$ = CRONE;
}
;
...

Reserved words

136

Parsing with yacc


In this case, the digit may be 0 or negative. In the action following the word CRONE, a check is made that
the preceding token shifted was not YOUNG. Obviously, this is only possible when a great deal is known
about what might precede the symbol noun in the input. Nevertheless, at times this mechanism prevents a
great deal of trouble especially when a few combinations are to be excluded from an otherwise regular
structure.

Support for arbitrary value types


By default, the values returned by actions and the lexical analyzer are integers. yacc can also support values of
other types including structures. In addition, yacc keeps track of the types and inserts appropriate union
member names so that the resulting parser is strictly type checked. The yacc value stack is declared to be a
union of the various types of values desired. You declare the union and associate union member names with
each token and nonterminal symbol having a value. When the value is referenced through a $$ or $n
construction, yacc will automatically insert the appropriate union name so that no unwanted conversions take
place.
There are three mechanisms used to provide for this typing. First, there is a way of defining the union. This
must be done by the user since other subroutines, notably the lexical analyzer, must know about the union
member names. Second, there is a way of associating a union member name with tokens and nonterminals.
Finally, there is a mechanism for describing the type of those few values where yacc cannot easily determine
the type.
To declare the union, you include
%union
{
body of union
}

in the declaration section. This declares the yacc value stack and the external variables yylval and yyval to
have type equal to this union. If yacc was invoked with the d option, the union declaration is copied into the
y.tab.h file as YYSTYPE.
Once YYSTYPE is defined, the union member names must be associated with the various terminal and
nonterminal names. The construction
<name>

is used to indicate a union member name. If this follows one of the keywords %token, %left, %right, and
%nonassoc, the union member name is associated with the tokens listed. Thus, saying
%left

<optype>

'+'

''

causes any reference to values returned by these two tokens to be tagged with the union member name
optype. Another keyword, %type, is used to associate union member names with nonterminals. Thus, one
might say
%type

<nodetype>

expr

stat

to associate the union member nodetype with the nonterminal symbols expr and stat.

Support for arbitrary value types

137

Parsing with yacc


There remain a couple of cases where these mechanisms are insufficient. If there is an action within a rule, the
value returned by this action has no a priori type. Similarly, reference to left context values (such as $0) leaves
yacc with no easy way of knowing the type. In this case, a type can be imposed on the reference by inserting a
union member name between < and > immediately after the first $. The example below
rule

aaa
{
$<intval>$ = 3;
}
bbb

{
fun( $<intval>2, $<other>0 );
}
;

shows this usage. This syntax has little to recommend it, but the situation arises rarely.
A sample specification is given in ``An advanced example''. The facilities in this subsection are not triggered
until they are used. In particular, the use of %type will turn on these mechanisms. When they are used, there
is a fairly strict level of checking. For example, use of $n or $$ to refer to something with no defined type is
diagnosed. If these facilities are not triggered, the yacc value stack is used to hold ints.

yacc input syntax


This section has a description of the yacc input syntax as a yacc specification. Context dependencies and so
forth are not considered. Ironically, although yacc accepts an LALR(1) grammar, the yacc input specification
language is most naturally specified as an LR(2) grammar; the sticky part comes when an identifier is seen in
a rule immediately following an action. If this identifier is followed by a colon, it is the start of the next rule;
otherwise, it is a continuation of the current rule, which just happens to have an action embedded in it. As
implemented, the lexical analyzer looks ahead after seeing an identifier and decides whether the next token
(skipping blanks, newlines, comments, and so on) is a colon. If so, it returns the token C_IDENTIFIER.
Otherwise, it returns IDENTIFIER. Literals (quoted strings) are also returned as IDENTIFIERs but never as
part of C_IDENTIFIERs.
/

grammar for the input to yacc

/ basic entries / %token IDENTIFIER / includes identifiers and literals / %token C_IDENTIFIER /
identifier (but not literal) followed by a : / %token NUMBER / [09]+ /
/ reserved words: %type=>TYPE %left=>LEFT,etc. /
%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
%token MARK / the %% mark / %token LCURL / the %{ mark / %token RCURL / the %} mark /
/ ASCII character literals stand for themselves /
%token spec
%%

yacc input syntax

138

Parsing with yacc


spec : defs MARK rules tail ; tail : MARK { In this action, eat up the rest of the file } | / empty: the second
MARK is optional / ;
defs : / empty / | defs def ; def : START IDENTIFIER | UNION { Copy union definition to output } |
LCURL { Copy C code to output file } RCURL | rword tag nlist ;
rword : TOKEN | LEFT | RIGHT | NONASSOC | TYPE ;
tag : / empty: union tag is optional / | '<' IDENTIFIER '>' ;
nlist : nmno | nlist nmno | nlist ',' nmno ; nmno : IDENTIFIER / Note: literal invalid with % type / |
IDENTIFIER NUMBER / Note: invalid with % type / ;
/ rule section /
rules : C_IDENTIFIER rbody prec | rules rule ; rule : C_IDENTIFIER rbody prec | '|' rbody prec ;
rbody : / empty / | rbody IDENTIFIER | rbody act ;
act : '{' { Copy action translate $$ etc. } '}' ;
prec : / empty / | PREC IDENTIFIER | PREC IDENTIFIER act | prec ';' ;

A simple example
This example gives the complete yacc applications for a small desk calculator; the calculator has 26 registers
labeled a through z and accepts arithmetic expressions made up of the operators +, , *, /, %,&, |, and the
assignment operators.
If an expression at the top level is an assignment, only the assignment is done; otherwise, the expression is
printed. As in the C language, an integer that begins with 0 is assumed to be octal; otherwise, it is assumed to
be decimal.
As an example of a yacc specification, the desk calculator does a reasonable job of showing how precedence
and ambiguities are used and demonstrates simple recovery. The major oversimplifications are that the lexical
analyzer is much simpler than for most applications, and the output is produced immediately line by line. Note
the way that decimal and octal integers are read in by grammar rules. This job is probably better done by the
lexical analyzer.
%{
# include <stdio.h>
# include <ctype.h>

int regs[26]; int base;


%}
%start list
%token DIGIT LETTER
A simple example

139

Parsing with yacc


%left '|' %left '&' %left '+' '' %left ' ' '/' '%' %left UMINUS / supplies precedence for unary minus /
%% / beginning of rules section /
list : / empty / | list stat '\n' | list error '\n' { yyerrok; } ;
stat : expr { (void) printf( "%d\n", $1 ); } | LETTER '=' expr { regs[$1] = $3; } ;
expr : '(' expr ')' { $$ = $2; } | expr '+' expr { $$ = $1 + $3; } | expr '' expr { $$ = $1 $3; { | expr ' ' expr {
$$ = $1 $3; } | expr '/' expr { $$ = $1 / $3; } | exp '%' expr { $$ = $1 % $3; } | expr '&' expr { $$ = $1 & $3;
} | expr '|' expr { $$ = $1 | $3; } | '' expr %prec UMINUS { $$ = $2; } | LETTER { $$ = reg[$1]; } | number
;
number : DIGIT { $$ = $1; base = ($1==0) ? 8 ; 10; } | number DIGIT { $$ = base $1 + $2; } ;
%% / beginning of subroutines section /
int yylex( ) / lexical analysis routine / { / return LETTER for lowercase letter, / / yylval = 0 through 25
/ / returns DIGIT for digit, yylval = 0 through 9 / / all other characters are returned immediately / int c;
/ skip blanks / while ((c = getchar()) == ' ') ;
/ c is now nonblank /
if (islower(c)) { yylval = c 'a'; return (LETTER); } if (isdigit(c)) } yylval = c '0'; return (DIGIT);
} return (c); }

An advanced example
This section gives an example of a grammar using some of the advanced features. The desk calculator in ``A
simple example'' is modified to provide a desk calculator that does floating point interval arithmetic. The
calculator understands floating point constants, and the arithmetic operations +, , , /, and unary . It uses the
registers a through z. Moreover, it understands intervals written
(X,Y)

where X is less than or equal to Y. There are 26 interval valued variables A through Z that may also be used.
The usage is similar to that in ``A simple example''; assignments return no value and print nothing while
expressions print the (floating or interval) value.
This example explores a number of interesting features of yacc and C. Intervals are represented by a structure
consisting of the left and right endpoint values stored as doubles. This structure is given a type name,
INTERVAL, by using typedef. The yacc value stack can also contain floating point scalars and integers
(used to index into the arrays holding the variable values). Notice that the entire strategy depends strongly on
being able to assign structures and unions in C language. In fact, many of the actions call functions that return
structures as well.
It is also worth noting the use of YYERROR to handle error conditions division by an interval containing
0 and an interval presented in the wrong order. The error recovery mechanism of yacc is used to throw away
the rest of the offending line.
An advanced example

140

Parsing with yacc


In addition to the mixing of types on the value stack, this grammar also demonstrates an interesting use of
syntax to keep track of the type (for example, scalar or interval) of intermediate expressions. Note that scalar
can be automatically promoted to an interval if the context demands an interval value. This causes a large
number of conflicts when the grammar is run through yacc: 18 shiftreduce and 26 reducereduce. The
problem can be seen by looking at the two input lines.
2.5 + (3.5 4.)

and
2.5 + (3.5, 4)

Notice that the 2.5 is to be used in an interval value expression in the second example, but this fact is not
known until the comma is read. By this time, 2.5 is finished, and the parser cannot go back and change its
mind. More generally, it might be necessary to look ahead an arbitrary number of tokens to decide whether to
convert a scalar to an interval. This problem is evaded by having two rules for each binary interval valued
operator one when the left operand is a scalar and one when the left operand is an interval. In the second
case, the right operand must be an interval, so the conversion will be applied automatically. Despite this
evasion, there are still many cases where the conversion may be applied or not, leading to the above conflicts.
They are resolved by listing the rules that yield scalars first in the specification file; in this way, the conflict
will be resolved in the direction of keeping scalar valued expressions scalar valued until they are forced to
become intervals.
This way of handling multiple types is instructive. If there were many kinds of expression types instead of just
two, the number of rules needed would increase dramatically and the conflicts even more dramatically. Thus,
it is better practice in a more normal programming language environment to keep the type information as part
of the value and not as part of the grammar.
Finally, a word about the lexical analysis. The only unusual feature is the treatment of floating point
constants. The C language library routine atof() is used to do the actual conversion from a character string to a
doubleprecision value. If the lexical analyzer detects an error, it responds by returning a token that is invalid
in the grammar, provoking a syntax error in the parser and thence error recovery.
%{

#include <stdio.h> #include <ctype.h>


typedef struct interval { double lo, hi; } INTERVAL;
INTERVAL vmul(), vdiv();
double atof();
double dreg[26];
INTERVAL vreg[26];
%}
%start lines
An advanced example

141

Parsing with yacc


%union { int ival; double dval; INTERVAL vval; }
%token <ival> DREG VREG / indices into dreg, vreg arrays /
%token <dval> CONST / floating point constant /
%type <dval> dexp / expression /
%type <vval> vexp / interval expression /
/ precedence information about the operators /
%left '+' '/' %left ' ' '/'
%% /* beginning of rules section */
lines : / empty / | lines line ; line : dexp '\n' { (void)printf("%15.8f\n", $1); } | vexp '\n' {
(void)printf("(%15.8f, %15.8f)\n", $1.lo, $1.hi); } | DREG '=' dexp '\n' { dreg[$1] = $3; } | VREG '=' vexp '\n'
{ vreg[$1] = $3; } | error '\n' { yyerrok; } ; dexp : CONST | DREG { $$ = dreg[$1]; } | dexp '+' dexp { $$ = $1
+ $3; } | dexp '' dexp { $$ = $1 $3; } | dexp ' ' dexp { $$ = $1 $3; } | dexp '/' dexp { $$ = $1 / $3; } | ''
dexp { $$ = $2; } | '(' dexp ')' { $$ = $2; } ; vexp : dexp { $$.hi = $$.lo = $1; } | '(' dexp ',' dexp ')' { $$.lo =
$2; $$.hi = $4; if($$.lo > $$.hi) { (void) printf("interval out of order\n"); YYERROR; } } | VREG { $$ =
vreg[$1]; } | vexp '+' vexp { $$.hi = $1.hi + $3.hi; $$.lo = $1.lo + $3.lo; } | dexp '+' vexp { $$.hi = $1 + $3.hi;
$$.lo = $1 + $3.lo; } | vexp '' vexp { $$.hi = $1.hi $3.lo; $$.lo = $1.lo $3.hi; } | dexp '' vexp { $$.hi = $1
$3.lo; $$.lo = $1 $3.hi; } | vexp ' ' vexp { $$ = vmul($1.lo, $1.hi, $3); } | dexp ' ' vexp { $$ = vmul($1,
$1, $3); } | vexp '/' vexp { if (dcheck($3)) YYERROR; $$ = vdiv($1.lo, $1.hi, $3); } | dexp '/' vexp { if
(dcheck($3)) YYERROR; $$ = vdiv($1, $1, $3); } | '' vexp { $$.hi = $2.lo; $$.lo = $2.hi; } | '(' vexp ')' {
$$ = $2; } ;
%% / beginning of subroutines section /
# define BSZ 50 / buffer size for floating point number /
/ lexical analysis /
int yylex() { register int c;
/ skip over blanks /
while ((c=getchar()) == ' ') ; if (isupper(c)) { yylval.ival = c 'A'; return(VREG); } if (islower(c)) {
yylval.ival = c 'a'; return(DREG); }
/ gobble up digits, points, exponents /
if (isdigit(c) || c == '.') { char buf[BSZ + 1], cp = buf; int dot = 0, exp = 0;
for (;(cp buf) < BSZ; ++cp, c = getchar()) { cp = c; if (isdigit(c)) continue; if (c == '.') { if (dot++ || exp)
return('.'); / will cause syntax error / continue; } if (c == 'e') { if (exp++) return('e'); / will cause syntax
error / continue; } / end of number / break; }

An advanced example

142

Parsing with yacc


cp = '\0'; if (cp buf >= BSZ) (void)printf("constant too long truncated\n"); else ungetc(c, stdin); / push
back last char read / yylval.dval = atof(buf); return(CONST); } return(c); }
INTERVAL hilo(a, b, c, d) double a, b, c, d; { / returns the smallest interval containing a, b, c, and d used by
vmul, vdiv routines /
INTERVAL v;
if (a > b) { v.hi = a; v.lo = b; } else { v.hi = b; v.lo = a; } if (c > d) { if (c > v.hi) v.hi = c; if (d < v.lo) v.lo = d;
} else { if (d > v.hi) v.hi = d; if (c < v.lo) v.lo = c; } return(v); } INTERVAL vmul(a, b, v) double a, b;
INTERVAL v; { return(hilo(a v.hi, a v.lo, b v.hi, b v.lo)); }
dcheck(v) INTERVAL v; { if (v.hi >= 0. && v.lo <= 0.) { (void) printf("divisor interval contains 0.\n");
return(1); } return(0); } INTERVAL vdiv(a, b, v) double a, b; INTERVAL v; { return(hilo(a / v.hi, a / v.lo, b /
v.hi, b / v.lo)); }

An advanced example

143

Managing file interactions with make


The trend toward increased modularity of programs means that a project may have to cope with a large
assortment of individual files. There may also be a wide range of generation procedures needed to turn the
assortment of individual files into the final executable product.
make provides a method for maintaining uptodate versions of programs that consist of a number of files
that may be generated in a variety of ways.
An individual programmer can easily forget
filetofile dependencies
files that were modified and the impact that has on other files
the exact sequence of operations needed to generate a new version of the program
make keeps track of the commands that create files and the relationship between files. Whenever a change is
made in any of the files that make up a program, the make command creates the finished program by
recompiling only those portions directly or indirectly affected by the change. The relationships between files
and the processes that generate files are specified by the user in a description file.
The basic operation of make is to
find the target in the description file
ensure that all the files on which the target depends, the files needed to generate the target, exist and
are up to date
(re)create the target file if any of the generators have been modified more recently than the target
The description file that holds the information on interfile dependencies and command sequences is
conventionally called makefile, Makefile, s.makefile, or s.Makefile. If this naming convention is followed,
the simple command make is usually sufficient to regenerate the target regardless of the number of files
edited since the last make. In most cases, the description file is not difficult to write and changes infrequently.
Even if only a single file has been edited, rather than entering all the commands to regenerate the target,
entering the make command ensures that the regeneration is done in the prescribed way.

Basic features
The basic operation of make is to update a target file by ensuring that all of the files on which the target file
depends exist and are up to date. The target file is regenerated if it has not been modified since the dependents
were modified. The make program builds and searches a graph of these dependencies. The operation of make
depends on its ability to find the date and time that a file was last modified.
The make program operates using three sources of information:
a usersupplied description file
file names and lastmodified times from the file system
builtin rules supply default dependency information and implied commands
To illustrate, consider a simple example in which a program named prog is made by compiling and loading
three C language files x.c, y.c, and z.c with the math library, libm. By convention, the output of the C
Managing file interactions with make

144

Managing file interactions with make


language compilations will be found in files named x.o, y.o, and z.o. Assume that the files x.c and y.c share
some declarations in a file named defs.h, but that z.c does not. That is, x.c and y.c have the line
#include "defs.h"

The following specification describes the relationships and operations:


prog :

x.o y.o z.o


cc x.o y.o z.o

lm

prog

x.o y.o : defs.h If this information were stored in a file named makefile, the command
$ make

would perform the operations needed to regenerate prog after any changes had been made to any of the four
source files x.c, y.c, z.c, or defs.h. In the example above, the first line states that prog depends on three .o
files. Once these object files are current, the second line describes how to combine them to create prog. The
third line states that x.o and y.o depend on the file defs.h. From the file system, make discovers that there are
three .c files corresponding to the needed .o files and uses builtin rules on how to generate an object from a
C source file (that is, issue a cc c command).
If make did not have the ability to determine automatically what needs to be done, the following longer
description file would be necessary:
prog :
x.o :
y.o :
z.o :

x.o
cc
x.c
cc
y.c
cc
z.c
cc

y.o z.o
x.o y.o z.o
defs.h
c x.c
defs.h
c y.c
c

lm

o prog

z.c

If none of the source or object files have changed since the last time prog was made, and all of the files are
current, the command make announces this fact and stops. If, however, the defs.h file has been edited, x.c and
y.c (but not z.c) are recompiled; and then prog is created from the new x.o and y.o files, and the existing z.o
file. If only the file y.c had changed, only it is recompiled; but it is still necessary to relink prog. If no target
name is given on the make command line, the first target mentioned in the description is created; otherwise,
the specified targets are made. The command
$ make x.o

would regenerate x.o if x.c or defs.h had changed.


A method often useful to programmers is to include rules with mnemonic names and commands that do not
actually produce a file with that name. These entries can take advantage of make's ability to generate files and
substitute macros (for information about macros, see ``Description files and substitutions''.) Thus, an entry
save might be included to copy a certain set of files, or an entry clean might be used to throw away unneeded
intermediate files.
If a file exists after such commands are executed, the file's time of last modification is used in further
decisions. If the file does not exist after the commands are executed, the current time is used in making further
decisions.
Managing file interactions with make

145

Managing file interactions with make


You can maintain a zerolength file purely to keep track of the time at which certain actions were performed.
This technique is useful for maintaining remote archives and listings.
A simple macro mechanism for substitution in dependency lines and command strings is used by make.
Macros can either be defined by commandline arguments or included in the description file. In either case, a
macro consists of a name followed by the symbol = followed by what the macro stands for. A macro is
invoked by preceding the name by the symbol $. Macro names longer than one character must be
parenthesized. The following are valid macro invocations:
$(CFLAGS)
$2
$(xy)
$Z
$(Z)

The last two are equivalent.


$ , $@, $?, and $< are four special macros that change values during the execution of the command. (These
four macros are described in ``Description files and substitutions''.) The following fragment shows assignment
and use of some macros:
OBJECTS = x.o y.o z.o
LIBES = lm
prog: $(OBJECTS)
cc $(OBJECTS) $(LIBES)
. . .

o prog

The command
$ make

LIBES="ll lm"

loads the three objects with both the lex (ll) and the math (lm) libraries, because macro definitions on the
command line override definitions in the description file. (In SCO OpenServer system commands, arguments
with embedded blanks must somehow be quoted.)
As an example of the use of make, a description file that might be used to maintain the make command itself
is given. The code for make is spread over a number of C language source files and has a yacc grammar. The
description file contains the following:
# Description file for the make command

FILES = Makefile defs.h main.c doname.c misc.c \ files.c dosys.c gram.y OBJECTS = main.o doname.o
misc.o files.o \ dosys.o gram.o LIBES = LINT = lint p CFLAGS = O LP = lp
make: $(OBJECTS) $(CC) $(CFLAGS) o make $(OBJECTS) $(LIBES) @size make
$(OBJECTS): defs.h
cleanup: rm .o gram.c du
install: make @size make /usr/ccs/bin/make cp make /usr/ccs/bin/make && rm make

Managing file interactions with make

146

Managing file interactions with make


lint: dosys.c doname.c files.c main.c misc.c gram.c $(LINT) dosys.c doname.c files.c main.c misc.c \ gram.c
# print files that are outofdate # with respect to "print" file.
print: $(FILES) pr $? | $(LP) touch print The make program prints out each command before issuing it.
The following output results from entering the command make in a directory containing only the source and
description files:
cc O c main.c
cc O c doname.c
cc O c misc.c
cc O c files.c
cc O c dosys.c
yacc gram.y
mv y.tab.c gram.c
cc O c gram.c
cc o make main.o doname.o misc.o files.o dosys.o gram.o
13188 + 3348 + 3044 = 19580

The last line results from the size make command. The printing of the command line itself was suppressed by
the symbol ``@'' in the description file.

Parallel make
If make is invoked with the P option, it tries to build more than one target at a time, in parallel. (This is done
by using the standard SCO OpenServer system process mechanism which enables multiple processes to run
simultaneously.)
prog :
x.o :
y.o :
z.o :

x.o
cc
x.c
cc
y.c
cc
z.c
cc

y.o z.o
x.o y.o z.o
defs.h
c x.c
defs.h
c y.c
c

lm

o prog

z.c

For the makefile shown above, it would create processes to build x.o, y.o and z.o in parallel. After these
processes were complete, it would build prog.
The number of targets make will try to build in parallel is determined by the value of the environment
variable PARALLEL. If P is invoked, but PARALLEL is not set, then make will try to build no more than
two targets in parallel.
You can use the .MUTEX directive to serialize the updating of some specified targets. This is useful when
two or more targets modify a common output file, such as when inserting modules into an archive or when
creating an intermediate file with the same name, as is done by lex and yacc.
If the makefile above contained a .MUTEX directive of the form
.MUTEX: x.o y.o

it would prevent make from building x.o and y.o in parallel.


Parallel make

147

Managing file interactions with make

Description files and substitutions


The following section will explain the customary elements of the description file.

Comments
The comment convention is that the symbol ``#'' and all characters on the same line after it are ignored. Blank
lines and lines beginning with ``#'' are totally ignored.

Continuation lines
If a noncomment line is too long, the line can be continued by using the symbol ``\'', which must be the last
character on the line. If the last character of a line is ``\'', then it, the newline, and all following blanks and
tabs are replaced by a single blank. Comments can be continued on to the next line as well.

Macro definitions
A macro definition is an identifier followed by the symbol ``=''. The identifier must not be preceded by a
colon (``:'') or a tab. The name (string of letters and digits) to the left of the = (trailing blanks and tabs are
stripped) is assigned the string of characters following the = (leading blanks and tabs are stripped). The
following are valid macro definitions:
2 = xyz
abc = ll ly lm
LIBES =

The last definition assigns LIBES the null string. A macro that is never explicitly defined has the null string
as its value. Remember, however, that some macros are explicitly defined in make's own rules.

General form
The general form of an entry in a description file is
target1 [target2 ...] :[:] [dependent1 ...] [; commands] [# ...]
[ \t commands] [# ...]
. . .

Items inside brackets may be omitted and targets and dependents are strings of letters, digits, periods, and
slashes. Shell metacharacters such as `` '' and ``?'' are expanded when the commands are evaluated.
Commands may appear either after a semicolon on a dependency line or on lines beginning with a tab
(denoted above as ``\t'') immediately following a dependency line. A command is any string of characters not
including #, except when # is in quotes.

Dependency information
A dependency line may have either a single or a double colon. A target name may appear on more than one
dependency line, but all of those lines must be of the same (single or double colon) type. For the more
common single colon case, a command sequence may be associated with at most one dependency line. If the
target is out of date with any of the dependents on any of the lines and a command sequence is specified (even
a null one following a semicolon or tab), it is executed; otherwise, a default rule may be invoked. In the
Description files and substitutions

148

Managing file interactions with make


double colon case, a command sequence may be associated with more than one dependency line. If the target
is out of date with any of the files on a particular line, the associated commands are executed. A builtin rule
may also be executed. The double colon form is particularly useful in updating archivetype files, where the
target is the archive library itself. (An example is included in ``Archive libraries''.)

Executable commands
If a target must be created, the sequence of commands is executed. Normally, each command line is printed
and then passed to a separate invocation of the shell after substituting for macros. The printing is suppressed
in the silent mode (s option of the make command) or if the command line in the description file begins with
an ``@'' sign. make normally stops if any command signals an error by returning a nonzero error code. Errors
are ignored if the i flag has been specified on the make command line, if the fake target name .IGNORE
appears in the description file, or if the command string in the description file begins with a hyphen (). If a
program is known to return a meaningless status, a hyphen in front of the command that invokes it is
appropriate. Because each command line is passed to a separate invocation of the shell, care must be taken
with certain commands (cd and shell control commands, for instance) that have meaning only within a single
shell process. These results are forgotten before the next line is executed.
Before issuing any command, certain internally maintained macros are set. The $@ macro is set to the full
target name of the current target. The $@ macro is evaluated only for explicitly named dependencies. The $?
macro is set to the string of names that were found to be younger than the target. The $? macro is evaluated
when explicit rules from the makefile are evaluated. If the command was generated by an implicit rule, the $<
macro is the name of the related file that caused the action; and the $ macro is the prefix shared by the
current and the dependent file names. If a file must be made but there are no explicit commands or relevant
builtin rules, the commands associated with the name .DEFAULT are used. If there is no such name, make
prints a message and stops.
In addition, a description file may also use the following related macros: $(@D), $(@F), $( D), $( F), $(<D),
and $(<F) (see below).

Extensions of $ , $@, and $<


The internally generated macros $ , $@, and $< are useful generic terms for current targets and outofdate
relatives. To this list is added the following related macros: $(@D), $(@F), $( D), $( F), $(<D), and $(<F).
The D refers to the directory part of the single character macro. The F refers to the file name part of the single
character macro. These additions are useful when building hierarchical makefiles. They allow access to
directory names for purposes of using the cd command of the shell. Thus, a command can be
cd $(<D); $(MAKE) $(<F)

Output translations
The values of macros are replaced when evaluated. The general form, where brackets indicate that the
enclosed sequence is optional, is as follows:
$(macro[:string1=[string2]])

The parentheses are optional if there is no substitution specification and the macro name is a single character.
If a substitution sequence is present, the value of the macro is considered to be a sequence of ``words''
separated by sequences of blanks, tabs, and newline characters. Then, for each such word that ends with
Executable commands

149

Managing file interactions with make


string1, string1 is replaced with string2 (or no characters if string2 is not present).
This particular substitution capability was chosen because make usually concerns itself with suffixes. The
usefulness of this type of translation occurs when maintaining archive libraries. Now, all that is necessary is to
accumulate the outofdate members and write a shell script that can handle all the C language programs
(that is, files ending in .c). Thus, the following fragment optimizes the executions of make for maintaining an
archive library:
$(LIB): $(LIB)(a.o) $(LIB)(b.o) $(LIB)(c.o)
$(CC) c $(CFLAGS) $(?:.o=.c)
$(AR) $(ARFLAGS) $(LIB) $?
rm $?

A dependency of the preceding form is necessary for each of the different types of source files (suffixes) that
define the archive library. These translations are added in an effort to make more general use of the wealth of
information that make generates.

Recursive makefiles
Another feature of make concerns the environment and recursive invocations. If the sequence $(MAKE)
appears anywhere in a shell command line, the line is executed even if the n flag is set. Since the n flag is
exported across invocations of make (through the MAKEFLAGS variable), the only thing that is executed is
the make command itself. This feature is useful when a hierarchy of makefiles describes a set of software
subsystems. For testing purposes, make n can be executed and everything that would have been done will be
printed including output from lowerlevel invocations of make.

Suffixes and transformation rules


make uses an internal table of rules to learn how to transform a file with one suffix into a file with another
suffix. If the r flag is used on the make command line, the internal table is not used.
The list of suffixes is actually the dependency list for the name .SUFFIXES. make searches for a file with
any of the suffixes on the list. If it finds one, make transforms it into a file with another suffix.
Transformation rule names are the concatenation of the before and after suffixes. The name of the rule to
transform a .r file to a .o file is thus .r.o. If the rule is present and no explicit command sequence has been
given in the user's description files, the command sequence for the rule .r.o is used. If a command is generated
by using one of these suffixing rules, the macro $ is given the value of the stem (everything but the suffix) of
the name of the file to be made; and the macro $< is the full name of the dependent that caused the action.
The order of the suffix list is significant since the list is scanned from left to right. The first name formed that
has both a file and a rule associated with it is used. If new names are to be appended, the user can add an entry
for
.SUFFIXES in the description file. The dependents are added to the usual list.
A .SUFFIXES line without any dependents deletes the current list. It is necessary to clear the current list if
the order of names is to be changed.

Implicit rules
make uses a table of suffixes and a set of transformation rules to supply default dependency information and
implied commands. The default suffix list (in order) is as follows:

Recursive makefiles

150

Managing file interactions with make


.o
Object file
.c
C source file
.c~
SCCS C source file
.y
yacc C source grammar
.y~
SCCS yacc C source grammar
.l
lex C source grammar
.l~
SCCS lex C source grammar
.s
Assembler source file
.s~
SCCS assembler source file
.sh
Shell file
.sh~
SCCS shell file
.h
Header file
.h~
SCCS header file
.f
FORTRAN source file
.f~
SCCS FORTRAN source file
.C
C++ source file
.C~
SCCS C++ source file
Recursive makefiles

151

Managing file interactions with make


.Y
yacc C++ source grammar
.Y~
SCCS yacc C++ source grammar
.L
lex C++ source grammar
.L~
SCCS lex C++ source grammar
A summarizes the default transformation paths. If there are two paths connecting a pair of suffixes, the
longer one is used only if the intermediate file exists or is named in the description.

Summary of default transformation path


If the file x.o is needed and an x.c is found in the description or directory, the x.o file would be compiled. If
there is also an x.l, that source file would be run through lex before compiling the result. However, if there is
no x.c but there is an x.l, make would discard the intermediate C language file and use the direct link as
shown in A.
It is possible to change the names of some of the compilers used in the default or the flag arguments with
which they are invoked by knowing the macro names used. The compiler names are the macros AS, CC,
C++C, F77, YACC, and LEX. The command
$ make CC=newcc

will cause the newcc command to be used instead of the usual C language compiler. The macros CFLAGS,
YFLAGS, LFLAGS, ASFLAGS, FFLAGS, and C++FLAGS may be set to cause these commands to be
issued with optional flags. Thus
$ make CFLAGS=g

causes the cc command to include debugging information.

Recursive makefiles

152

Managing file interactions with make

Archive libraries
The make program has an interface to archive libraries. A user may name a member of a library in the
following manner:
projlib(object.o)

or
projlib((entry_pt))

where the second method actually refers to an entry point of an object file within the library. (make looks
through the library, locates the entry point, and translates it to the correct object file name.)
To use this procedure to maintain an archive library, the following type of makefile is required:
projlib::
projlib(pfile1.o)
$(CC) c $(CFLAGS) pfile1.c
$(AR) $(ARFLAGS) projlib pfile1.o
rm pfile1.o
projlib::
projlib(pfile2.o)
$(CC) c $(CFLAGS) pfile2.c
$(AR) $(ARFLAGS) projlib pfile2.o
rm pfile2.o

and so on for each object. This is tedious and error prone. Obviously, the command sequences for adding a C
language file to a library are the same for each invocation; the file name being the only difference each time.
(This is true in most cases.)
The make command also gives the user access to a rule for building libraries. The handle for the rule is the .a
suffix. Thus, a .c.a rule is the rule for compiling a C language source file, adding it to the library, and
removing the .o file. Similarly, the .y.a, the .s.a, and the .l.a rules rebuild yacc, assembler, and lex files,
respectively. The archive rules defined internally are .c.a, .c~.a, .f.a, .f~.a, and .s~.a. (The tilde (``~'') syntax
will be described shortly.) The user may define other needed rules in the description file.
The above twomember library is then maintained with the following shorter makefile:
projlib:

projlib(pfile1.o) projlib(pfile2.o)
@echo projlib uptodate.

The internal rules are already defined to complete the preceding library maintenance. The actual .c.a rule is as
follows:
.c.a:
$(CC) c $(CFLAGS) $<
$(AR) $(ARFLAGS) $@ $(<F:.c=.o)
rm f $(<F:.c=.o)

Thus, the $@ macro is the .a target (projlib); the $< and $ macros are set to the outofdate C language file,
and the file name minus the suffix, respectively (pfile1.c and pfile1). The $< macro (in the preceding rule)
could have been changed to $ .c.
It is useful to go into some detail about exactly what make does when it sees the construction
Archive libraries

153

Managing file interactions with make


projlib:
projlib(pfile1.o)
@echo projlib uptodate

Assume the object in the library is out of date with respect to pfile1.c. Also, there is no pfile1.o file.
1. make projlib.
2. Before makeing projlib, check each dependent of projlib.
3. projlib(pfile1.o) is a dependent of projlib and needs to be generated.
4. Before generating projlib(pfile1.o), check each dependent of projlib(pfile1.o). (There are none.)
5. Use internal rules to try to create projlib(pfile1.o). (There is no explicit rule.) Note that
projlib(pfile1.o) has a parenthesis in the name to identify the target suffix as .a. This is the key. There
is no explicit .a at the end of the projlib library name. The parenthesis implies the .a suffix. In this
sense, the .a is hardwired into make.
6. Break the name projlib(pfile1.o) up into projlib and pfile1.o. Define two macros, $@ (projlib) and
$ (pfile1).
7. Look for a rule .X.a and a file $ .X. The first .X (in the
.SUFFIXES list) which fulfills these conditions is .c so the rule is .c.a, and the file is pfile1.c. Set $<
to be pfile1.c and execute the rule. In fact, make must then compile pfile1.c.
8. The library has been updated. Execute the command associated with the projlib: dependency, namely
@echo projlib uptodate

It should be noted that to let pfile1.o have dependencies, the following syntax is required:
projlib(pfile1.o):

$(INCDIR)/stdio.h

pfile1.c

There is also a macro for referencing the archive member name when this form is used. The $% macro is
evaluated each time $@ is evaluated. If there is no current archive member, $% is null. If an archive member
exists, then $% evaluates to the expression between the parenthesis.

Source code control system file names


The syntax of make does not directly permit referencing of prefixes. For most types of files on SCO
OpenServer operating system machines, this is acceptable since nearly everyone uses a suffix to distinguish
different types of files. SCCS files are the exception. Here, s. precedes the file name part of the complete path
name.
To allow make easy access to the prefix s. the symbol ``~'' is used as an identifier of SCCS files. Hence, .c~.o
refers to the rule which transforms an SCCS C language source file into an object file. Specifically, the
internal rule is
.c~.o:
$(GET) $(GFLAGS) $<
$(CC) $(CFLAGS) c $ .c
rm f $ .c

Thus, ~ appended to any suffix transforms the file search into an SCCS file name search with the actual suffix
named by the dot and all characters up to (but not including) ~.
The following SCCS suffixes are internally defined:

Source code control system file names

154

Managing file interactions with make


.c~ .sh~ .C~
.y~ .h~ .Y~
.l~ .f~ .L~
.s~
The following rules involving SCCS transformations are internally defined:

.c~:
.c~.c:
.c~.a:
.c~.o:
.y~.c:
.y~.o:
.y~.y:
.l~.c:
.l~.o:
.l~.l:
.s~:

.s~.s:
.s~.a:
.s~.o:
.sh~:
.sh~.sh:
.h~.h:
.f~:
.f~.f:
.f~.a:
.f~.o:

.C~:
.C~.C:
.C~.a:
.C~.o:
.Y~.C:
.Y~.o:
.Y~.Y:
.L~.C:
.L~.o:
.L~.L:

Obviously, the user can define other rules and suffixes that may prove useful. The ~ provides a handle on the
SCCS file name format so that this is possible.

The null suffix


There are many programs that consist of a single source file. make handles this case by the null suffix rule.
Thus, to maintain the SCO OpenServer system program cat, a rule in the makefile of the following form is
needed:
.c:
$(CC) $(CFLAGS) o $@ $< $(LDFLAGS)

In fact, this .c: rule is internally defined so no makefile is necessary at all. The user only needs to enter
$ make cat dd echo date

(these are all SCO OpenServer system singlefile programs) and all four C language source files are passed
through the above shell command line associated with the .c: rule. The internally defined single suffix rules
are

.c: .sh: .f~:


.c~: .sh~: .C:
.s: .f:
.C~:
.s~:
Others may be added in the makefile by the user.

The null suffix

155

Managing file interactions with make

Included files
The make program has a capability similar to the #include directive of the C preprocessor. If the string
include appears as the first seven letters of a line in a makefile and is followed by a blank or a tab, the rest of
the line is assumed to be a file name, which the current invocation of make will read. Macros may be used in
file names. The file descriptors are stacked for reading include files so that no more than 16 levels of nested
includes are supported.

SCCS makefiles
Makefiles under SCCS control are accessible to make. That is, if make is typed and only a file named
s.makefile or s.Makefile exists, make will do a get on the file, then read and remove the file.

Dynamic dependency parameters


A dynamic dependency parameter has meaning only on the dependency line in a makefile. The $$@ refers to
the current ``thing'' to the left of the : symbol (which is $@). Also the form $$(@F) exists, which allows
access to the file part of $@. Thus, in the following:
cat:

$$@.c

the dependency is translated at execution time to the string cat.c. This is useful for building a large number of
executable files, each of which has only one source file. For instance, the SCO OpenServer system software
command directory could have a makefile like:
CMDS = cat dd echo date cmp comm chown

$(CMDS): $$@.c $(CC) $(CFLAGS) $? o $@ Obviously, this is a subset of all the single file programs. For
multiple file programs, a directory is usually allocated and a separate makefile is made. For any particular file
that has a peculiar compilation procedure, a specific entry must be made in the makefile.
The second useful form of the dependency parameter is $$(@F). It represents the file name part of $$@.
Again, it is evaluated at execution time. Its usefulness becomes evident when trying to maintain the
/usr/include directory from
makefile in the /usr/src/head directory. Thus, the /usr/src/head/makefile would look like
INCDIR = /usr/include

INCLUDES = \ $(INCDIR)/stdio.h \ $(INCDIR)/pwd.h \ $(INCDIR)/dir.h \ $(INCDIR)/a.out.h


$(INCLUDES): $$(@F) cp $? $@ chmod 0444 $@ This would completely maintain the /usr/include
directory whenever one of the above files in /usr/src/head was updated.

The make command


The make(CP) command takes macro definitions, options, description file names, and target file names as
arguments in the form:
$ make [ options ] [ macro definitions and targets ]
Included files

156

Managing file interactions with make


The following summary of command operations explains how these arguments are interpreted.
First, all macro definition arguments (arguments with embedded = symbols) are analyzed and the assignments
made. Command line macros override corresponding definitions found in the description files. Next, the
option arguments are examined. The permissible options are as follows:

i
Ignore error codes returned by invoked commands. This mode is entered if the fake target name
.IGNORE appears in the description file.
s
Silent mode. Do not print command lines before executing. This mode is also entered if the fake
target name .SILENT appears in the description file.
r
Do not use the builtin rules.
n
No execute mode. Print commands, but do not execute them. Even lines beginning with an ``@'' sign
are printed.
t
Touch the target files (causing them to be up to date) rather than issue the usual commands.
q
Question. The make command returns a zero or nonzero status code depending on whether the target
file is or is not up to date.
p
Print out the complete set of macro definitions and target descriptions.
k
Abandon work on the current entry if something goes wrong, but continue on other branches that do
not depend on the current entry.
e
Environment variables override assignments within makefiles.
f
Description file name. The next argument is assumed to be the name of a description file. A file name
of denotes the standard input. If there are no f arguments, the file named makefile, Makefile,
s.makefile, or s.Makefile in the current directory is read. The contents of the description files
override the builtin rules if they are present.
P
Update, in parallel, more than one target at a time. The number of targets updated concurrently is
determined by the environment variable PARALLEL and the presence of .MUTEX directives in
makefiles.
The following two fake target names are evaluated in the same manner as flags:
Included files

157

Managing file interactions with make


.DEFAULT
If a file must be made but there are no explicit commands or relevant builtin rules, the commands
associated with the name .DEFAULT are used if it exists.
.PRECIOUS
Dependents on this target are not removed when quit or interrupt is pressed.
Finally, the remaining arguments are assumed to be the names of targets to be made and the arguments are
done in lefttoright order. If there are no such arguments, the first name in the description file that does not
begin with the symbol . is made.

Environment variables
Environment variables are read and added to the macro definitions each time make executes. Precedence is a
prime consideration in doing this properly. The following describes make's interaction with the environment.
A macro, MAKEFLAGS, is maintained by make. The macro is defined as the collection of all input flag
arguments into a string (without minus signs). The macro is exported and thus accessible to recursive
invocations of make. Command line flags and assignments in the makefile update MAKEFLAGS. Thus, to
describe how the environment interacts with make, the MAKEFLAGS macro (environment variable) must
be considered.
When executed, make assigns macro definitions in the following order:
1. Read the MAKEFLAGS environment variable. If it is not present or null, the internal make variable
MAKEFLAGS is set to the null string. Otherwise, each letter in MAKEFLAGS is assumed to be an
input flag argument and is processed as such. (The only exceptions are the f, p, and r flags.)
2. Read the internal list of macro definitions.
3. Read the environment. The environment variables are treated as macro definitions and marked as
exported (in the shell sense).
4. Read the makefile(s). The assignments in the makefile(s) override the environment. This order is
chosen so that when a makefile is read and executed, you know what to expect. That is, you get what
is seen unless the e flag is used. The e is the input flag argument, which tells make to have the
environment override the makefile assignments. Thus, if make e is entered, the variables in the
environment override the definitions in the makefile. Also MAKEFLAGS overrides the environment
if assigned. This is useful for further invocations of make from the current makefile.
It may be clearer to list the precedence of assignments. Thus, in order from least binding to most binding, the
precedence of assignments is as follows:
1. internal definitions
2. environment
3. makefile(s)
4. command line
The e flag has the effect of rearranging the order to:
1. internal definitions
2. makefile(s)
3. environment
Environment variables

158

Managing file interactions with make


4. command line
This order is general enough to allow a programmer to define a makefile or set of makefiles whose
parameters are dynamically definable.

Suggestions and warnings


The most common difficulties arise from make's specific meaning of dependency. If file x.c has a
#include "defs.h"

line, then the object file x.o depends on defs.h; the source file x.c does not. If defs.h is changed, nothing is
done to the file x.c while file x.o must be recreated.
To discover what make would do, the n option is very useful. The command
$ make n

orders make to print out the commands that make would issue without actually taking the time to execute
them. If a change to a file is absolutely certain to be mild in character (adding a comment to an include file,
for example), the t (touch) option can save a lot of time. Instead of issuing a large number of superfluous
recompilations, make updates the modification times on the affected file. Thus, the command
$ make ts

(touch silently) causes the relevant files to appear up to date. Obvious care is necessary because this mode of
operation subverts the intention of make and destroys all memory of the previous relationships.

Internal rules
The standard set of internal rules used by make are reproduced below.
#
#
#

SUFFIXES RECOGNIZED BY MAKE

.SUFFIXES: .o .c .c~ .y .y~ .l .l~ .s .s~ .sh .sh~ .h .h~ .f .f~ .C .C~ \ .Y .Y~ .L .L~
# # PREDEFINED MACROS #
AR=ar ARFLAGS=rv AS=as ASFLAGS= BUILD=build CC=cc CFLAGS=O C++C=CC C++FLAGS=O
F77=f77 FFLAGS=O GET=get GFLAGS= LEX=lex LFLAGS= LD=ld LDFLAGS= MAKE=make
YACC=yacc YFLAGS= # # SPECIAL RULES #
markfile.o : markfile A=@; echo "static char _sccsid[]=\042`grep $$A'(#)' markfile`\042;" \ > markfile.c
$(CC) c markfile.c rm f markfile.c # # SINGLE SUFFIX RULES #
.c: $(CC) $(CFLAGS) o $@ $< $(LDFLAGS)
.c~: $(GET) $(GFLAGS) $< $(CC) $(CFLAGS) o $@ $ .c $(LDFLAGS) rm f $ .c

Suggestions and warnings

159

Managing file interactions with make


.s: $(AS) $(ASFLAGS) o $*.o $< $(CC) o $@ $ .o $(LDFLAGS) rm f $\(8**.o
.s~: $(GET) $(GFLAGS) $< $(AS) $(ASFLAGS) o $ .o $ .s $(CC) o $ $ .o $(LDFLAGS) rm f
$ .[so]
.sh: cp $< $@; chmod +x $@
.sh~: $(GET) $(GFLAGS) $< cp $ .sh $ ; chmod +x $@ rm f $ .sh
.f: $(F77) $(FFLAGS) o $@ $< $(LDFLAGS)
.f~: $(GET) $(GFLAGS) $< $(F77) $(FFLAGS) o $@ $ .f $(LDFLAGS) rm f $ .f
.C: $(C++C) $(C++FLAGS) o $@ $< $(LDFLAGS)
.C~: $(GET) $(GFLAGS) $< $(C++C) $(C++FLAGS) o $@ $ .C $(LDFLAGS) rm f $ .C # # DOUBLE
SUFFIX RULES #
.c~.c .y~.y .l~.l .s~.s .sh~.sh .h~.h: .f~.f .C~.C .Y~.Y .L~.L: $(GET) $(GFLAGS) $<
.c.a: $(CC) $(CFLAGS) c $< $(AR) $(ARFLAGS) $@ $(<F:.c=.o) rm f $(<F:.c=.o)
.c~.a: $(GET) $(GFLAGS) $< $(CC) $(CFLAGS) c $ .c $(AR) $(ARFLAGS) $@ $ .o rm f $ .[co]
.c.o: $(CC) $(CFLAGS) c $<
.c~.o: $(GET) $(GFLAGS) $< $(CC) $(CFLAGS) c $ .c rm f $ .c
.y.c: $(YACC) $(YFLAGS) $< mv y.tab.c $@
.y~.c: $(GET) $(GFLAGS) $< $(YACC) $(YFLAGS) $ .y mv y.tab.c $ .c rm f $ .y
.y.o: $(YACC) $(YFLAGS) $< $(CC) $(CFLAGS) c y.tab.c rm f y.tab.c mv y.tab.o $@
.y~.o: $(GET) $(GFLAGS) $< $(YACC) $(YFLAGS) $ .y $(CC) $(CFLAGS) c y.tab.c rm f y.tab.c $ .y
mv y.tab.o $ .o
.l.c: $(LEX) $(LFLAGS) $< mv lex.yy.c $@
.l~.c: $(GET) $(GFLAGS) $< $(LEX) $(LFLAGS) $ .l mv lex.yy.c $@ rm f $ .l
.l.o: $(LEX) $(LFLAGS) $< $(CC) $(CFLAGS) c lex.yy.c rm f lex.yy.c mv lex.yy.o $@
.l~.o: $(GET) $(GFLAGS) $< $(LEX) $(LFLAGS) $ .l $(CC) $(CFLAGS) c lex.yy.c rm f lex.yy.c $ .l
mv lex.yy.o $@
.s.a: $(AS) $(ASFLAGS) o $ .o $ .s $(AR) $(ARFLAGS) $@ $ .o
.s~.a: $(GET) $(GFLAGS) $< $(AS) $(ASFLAGS) o $ .o $ .s $(AR) $(ARFLAGS) $@ $ .o rm f
$ .[so]

Suggestions and warnings

160

Managing file interactions with make


.s.o: $(AS) $(ASFLAGS) o $@ $<
.s~.o: $(GET) $(GFLAGS) $< $(AS) $(ASFLAGS) o $ .o $ .s rm f $ .s
.f.a: $(F77) $(FFLAGS) c $ .f $(AR) $(ARFLAGS) $@ $(<F:.f=.o) rm f $(<F:.f=.o)
.f~.a: $(GET) $(GFLAGS) $< $(F77) $(FFLAGS) c $ .f $(AR) $(ARFLAGS) $@ $ .o rm f $ .[fo]
.f.o: $(F77) $(FFLAGS) c $ .f
.f~.o: $(GET) $(GFLAGS) $< $(F77) $(FFLAGS) c $ .f rm f $ .f
.C.a: $(C++C) $(C++FLAGS) c $< $(AR) $(ARFLAGS) $@ $(<F:.C=.o) rm f $(<F:.C=.o)
.C~.a: $(GET) $(GFLAGS) $< $(C++C) $(C++FLAGS) c $ .C $(AR) $(ARFLAGS) $@ $ .o rm f
$ .[Co]
.C.o: $(C++C) $(C++FLAGS) c $<
.C~.o: $(GET) $(GFLAGS) $< $(C++C) $(C++FLAGS) c $ .C rm f $ .C
.Y.C: $(YACC) $(YFLAGS) $< mv y.tab.c $@
.Y~.C: $(GET) $(GFLAGS) $< $(YACC) $(YFLAGS) $ .Y mv y.tab.c $ .C rm f $ .Y
.Y.o: $(YACC) $(YFLAGS) $< $(C++C) $(C++FLAGS) c y.tab.c rm f y.tab.c mv y.tab.o $@
.Y~.o: $(GET) $(GFLAGS) $< $(YACC) $(YFLAGS) $ .Y $(C++C) $(C++FLAGS) c y.tab.c rm f
y.tab.c $ .Y mv y.tab.o $ .o
.L.C: $(LEX) $(LFLAGS) $< mv lex.yy.c $@
.L~.C: $(GET) $(GFLAGS) $< $(LEX) $(LFLAGS) $ .L mv lex.yy.c $@ rm f $ .L
.L.o: $(LEX) $(LFLAGS) $< $(C++C) $(C++FLAGS) c lex.yy.c rm f lex.yy.c mv lex.yy.o $@
.L~.o: $(GET) $(GFLAGS) $< $(LEX) $(LFLAGS) $ .L $(C++C) $(C++FLAGS) c lex.yy.c rm f
lex.yy.c $ .L mv lex.yy.o $@
make internal rules

Suggestions and warnings

161

Tracking versions with SCCS


The Source Code Control System, SCCS, is a set of programs that you can use to track evolving versions of
files, ordinary text files as well as source files. SCCS takes custody of a file and, when changes are made,
identifies and stores them in the file with the original source code and/or documentation. As other changes are
made, they too are identified and retained in the file.
Retrieval of the original or any set of changes is possible. Any version of the file as it develops can be
reconstructed for inspection or additional modification. History information can be stored with each version:
why the changes were made, who made them, and when they were made.
This topic covers the following topics:
basics: creating, retrieving, and updating an SCCS file
delta numbering: how versions of an SCCS file are named
SCCS command conventions: what rules apply to SCCS commands
SCCS commands: the 14 SCCS commands and their more useful arguments
SCCS files: protection, format, and auditing of SCCS files

Basic usage
Several terminal session fragments are presented in this section. Try them all. The best way to learn SCCS is
to use it.

Terminology
A delta is a set of changes made to a file under SCCS custody. To identify and keep track of a delta, it is
assigned an SID (SCCS IDentification) number. The SID for any original file turned over to SCCS is
composed of release number 1 and level number 1, stated as 1.1. The SID for the first set of changes made to
that file, that is, its first delta, is release 1 version 2, or 1.2. The next delta would be 1.3, the next 1.4, and so
on. (For more on delta numbering, see ``Delta numbering''.) At this point, it is enough to know that by default
SCCS assigns SIDs automatically.

Creating an SCCS file with admin


Suppose you have a file called lang that is simply a list of five programming language names:
C
PL/I
FORTRAN
COBOL
ALGOL

Custody of your lang file can be given to SCCS using the admin (for administer) command. The following
creates an SCCS file from the lang file:
$ admin ilang s.lang

All SCCS files must have names that begin with s., hence s.lang. The i keyletter, together with its value
lang, means admin is to create an SCCS file and initialize it with the contents of the file lang.
Tracking versions with SCCS

162

Tracking versions with SCCS


The admin command replies
No id keywords (cm7)

This is a warning message that may also be issued by other SCCS commands. Ignore it for now. Its
significance is described in ``get''. In the following examples, this warning message is not shown although it
may be issued.
Remove the lang file. It is no longer needed because it exists now under SCCS as s.lang.
$ rm lang

Retrieving a file with get


The command
$ get s.lang

retrieves the latest version of s.lang and prints


1.1
5 lines

This tells you that get retrieved version 1.1 of the file, which is made up of five lines of text.
The retrieved text is placed in a new file called lang. That is, if you list the contents of your directory, you will
see both lang and s.lang.
The get s.lang command creates lang, a file meant for viewing (readonly), not for making changes to. If you
want to make changes to it, the e (edit) option must be used. This is done as follows:
$ get e s.lang

get e causes SCCS to create lang for both reading and writing (editing). It also places certain information
about lang in another new file, called p.lang, which is needed later by the delta command. Now if you list the
contents of your directory, you will see s.lang, lang, and p.lang.
get e prints the same messages as get, except that the SID for the first delta you will create also is issued:
1.1
new delta 1.2
5 lines

Change lang by adding two more programming languages:


SNOBOL
ADA

Retrieving a file with get

163

Tracking versions with SCCS

Recording changes with delta


Next, use the delta command as follows:
$ delta s.lang

delta then prompts with


comments?

Your response should be an explanation of why the changes were made. For example,
added more languages

delta now reads the file p.lang and determines what changes you made to lang. It does this by doing its own
get to retrieve the original version and applying the diff(C) command to the original version and the edited
version. Next, delta stores the changes in s.lang and destroys the no longer needed p.lang and lang files.
When this process is complete, delta outputs
1.2
2 inserted
0 deleted
5 unchanged
The number 1.2 is the SID of the delta you just created, and the next three lines summarize what was done to
s.lang.

More on get
The command
$ get s.lang

retrieves the latest version of the file s.lang, now 1.2. SCCS does this by starting with the original version of
the file and applying the delta you made. If you use the get command now, any of the following will retrieve
version 1.2:
$ get s.lang
$ get r1 s.lang
$ get r1.2 s.lang

The numbers following r are SIDs. When you omit the level number of the SID (as in get r1 s.lang), the
default is the highest level number that exists within the specified release. Thus, the second command requests
the retrieval of the latest version in release 1, namely 1.2. The third command requests the retrieval of a
particular version, in this case also 1.2.
Whenever a major change is made to a file, you may want to signify it by changing the release number, the
first number of the SID. This, too, is done with the get command:
$ get e r2 s.lang

Recording changes with delta

164

Tracking versions with SCCS


Because release 2 does not exist, get retrieves the latest version before release 2. get also interprets this as a
request to change the release number of the new delta to 2, thereby naming it 2.1 rather than 1.3. The output is
1.2
new delta 2.1
7 lines

which means version 1.2 has been retrieved, and 2.1 is the version the delta command will create. If the file is
now edited for example, by deleting COBOL from the list of languages and delta is executed
$ delta s.lang
comments? deleted cobol from list of languages

you will see by delta's output that version 2.1 is indeed created:
2.1
0 inserted
1 deleted
6 unchanged

Deltas can now be created in release 2 (deltas 2.2, 2.3, etc.), or another new release can be created in a similar
manner. A delta can still be made to the ``old'' release 1. This is explained in ``Delta numbering''.

The help command


If the command
$ get lang

is now executed, the following message will be output:


ERROR [lang]: not an SCCS file (co1)

The code co1 can be used with help to print a fuller explanation of the message:
$ help co1

This gives the following explanation of why get lang produced an error message:
co1:
"not an SCCS file"
A file that you think is an SCCS file
does not begin with the characters "s.".

help is useful whenever there is doubt about the meaning of almost any SCCS message.

Delta numbering
Think of deltas as the nodes of a tree in which the root node is the original version of the file. The root node
is normally named 1.1 and deltas (nodes) are named 1.2, 1.3, etc. The components of these SIDs are called
release and level numbers, respectively. Thus, normal naming of new deltas proceeds by incrementing the
level number. This is done automatically by SCCS whenever a delta is made.

The help command

165

Tracking versions with SCCS


Because the user may change the release number to indicate a major change, the release number then applies
to all new deltas unless specifically changed again. Thus, the evolution of a particular file could be
represented by the following figure.

Evolution of an SCCS file


This is the normal sequential development of an SCCS file, with each delta dependent on the preceding deltas.
Such a structure is called the trunk of an SCCS tree.
There are situations that require branching an SCCS tree. That is, changes are planned to a given delta that
will not be dependent on all previous deltas. For example, consider a program in production use at version 1.3
and for which development work on release 2 is already in progress. Release 2 may already have a delta in
progress as shown in the previous figure. Assume that a production user reports a problem in version 1.3 that
cannot wait to be repaired in release 2. The changes necessary to repair the trouble will be applied as a delta to
version 1.3 (the version in production use). This creates a new version that will then be released to the user but
will not affect the changes being applied for release 2 (that is, deltas 1.4, 2.1, 2.2, etc.). This new delta is the
first node of a new branch of the tree.
Branch delta names always have four SID components: the same release number and level number as the
trunk delta, plus a branch number and sequence number. The format is as follows:
release.level.branch.sequence

The branch number of the first delta branching off any trunk delta is always 1, and its sequence number is also
1. For example, the full SID for a delta branching off trunk delta 1.3 will be 1.3.1.1. As other deltas on that
same branch are created, only the sequence number changes: 1.3.1.2, 1.3.1.3, etc. This is shown in ``Tree
structure with branch deltas''.

Tree structure with branch deltas


The branch number is incremented only when a delta is created that starts a new branch off an existing branch,
as shown in ``Extended branching concept''. As this secondary branch develops, the sequence numbers of its
deltas are incremented (1.3.2.1, 1.3.2.2, etc.), but the secondary branch number remains the same.

The help command

166

Tracking versions with SCCS

Extended branching concept


The concept of branching may be extended to any delta in the tree, and the numbering of the resulting deltas
proceeds as shown above. SCCS allows the generation of complex tree structures. Although this capability
has been provided for certain specialized uses, the SCCS tree should be kept as simple as possible.
Comprehension of its structure becomes difficult as the tree becomes complex.

SCCS command conventions


SCCS commands accept two types of arguments, keyletters and file names. Keyletters are options that begin
with a hyphen () followed by a lowercase letter and, in some cases, a value.
File and/or directory names specify the file(s) the command is to process. Naming a directory is equivalent to
naming all the SCCS files within the directory. NonSCCS files and unreadable files in the named directories
are silently ignored.
In general, file name arguments may not begin with a hyphen. If a lone hyphen is specified, the command will
read the standard input (usually your terminal) for lines and take each line as the name of an SCCS file to be
processed. The standard input is read until endoffile. This feature is often used in pipelines.
Keyletters are processed before file names, so the placement of keyletters is arbitrarythey may be
interspersed with file names. File names, however, are processed left to right. Somewhat different conventions
apply to help, what, sccsdiff, and val, detailed under ``SCCS commands''.
Certain actions of various SCCS commands are controlled by flags appearing in SCCS files. Some of these
flags will be discussed, but for a complete description see admin(CP).
The distinction between real user (see passwd(C)) and effective user will be of concern in discussing various
actions of SCCS commands. For now, assume that the real and effective users are the samethe person
logged into the UNIX system.

x.files and z.files


All SCCS commands that modify an SCCS file do so by first writing and modifying a copy called x.file. This
is done to ensure that the SCCS file is not damaged if processing terminates abnormally. x.file is created in the
same directory as the SCCS file, given the same mode (see chmod(C)) and is owned by the effective user. It
exists only for the duration of the execution of the command that creates it. When processing is complete, the
contents of s.file are replaced by the contents of x.file, whereupon x.file is destroyed.
To prevent simultaneous updates to an SCCS file, the same modifying commands also create a lockfile
SCCS command conventions

167

Tracking versions with SCCS


called z.file. z.file contains the process number of the command that creates it, and its existence prevents other
commands from processing the SCCS file. z.file is created with access permission mode 444 (readonly for
owner, group, and other) in the same directory as the SCCS file and is owned by the effective user. It exists
only for the duration of the execution of the command that creates it.
In general, you can ignore these files. They are useful only in the event of system crashes or similar situations.

Error messages
SCCS commands produce error messages on the diagnostic output in this format:
ERROR [file]: message text (code)

The code in parentheses can be used as an argument to the help command to obtain a further explanation of
the message. Detection of a fatal error during the processing of a file causes the SCCS command to stop
processing that file and proceed with the next file specified.

SCCS commands
This section describes the major features of the fourteen SCCS commands and their most common arguments.
Here is a quickreference overview of the commands:

get(1)
retrieves versions of SCCS files.
unget(1)
undoes the effect of a get e prior to the file being deltaed.
delta(1)
applies deltas (changes) to SCCS files and creates new versions.
admin(1)
initializes SCCS files, manipulates their descriptive text, and controls delta creation rights.
prs(1)
prints portions of an SCCS file in userspecified format.
sact(1)
prints information about files that are currently out for editing.
help(1)
gives explanations of error messages.
rmdel(1)
removes a delta from an SCCS fileallows removal of deltas created by mistake.
cdc(1)
changes the commentary associated with a delta.
Error messages

168

Tracking versions with SCCS


what(1)
searches any SCO OpenServer system file(s) for all occurrences of a special pattern and prints out
what follows it useful in finding identifying information inserted by the get command.
sccsdiff(1)
shows differences between any two versions of an SCCS file.
comb(1)
combines consecutive deltas into one to reduce the size of an SCCS file.
val(1)
validates an SCCS file.

get
The get command creates a file that contains a specified version of an SCCS file. The version is retrieved by
beginning with the initial version and then applying deltas, in order, until the desired version is obtained. The
resulting file, called a gfile (for gotten), is created in the current directory and is owned by the real user. The
mode assigned to the gfile depends on how the get command is used.
The most common use of get is
$ get s.abc

which normally retrieves the latest version of s.abc from the SCCS file tree trunk and produces (for example)
on the standard output
1.3
67 lines
No id keywords (cm7)

meaning version 1.3 of s.abc was retrieved (assuming 1.3 is the latest trunk delta), it has 67 lines of text, and
no ID keywords were substituted in the file.
The gfile, namely, file abc, is given access permission mode 444 (readonly for owner, group, and other).
This particular way of using get is intended to produce gfiles only for inspection, compilation, or copying,
for example. It is not intended for editing (making deltas).
When several files are specified, the same information is output for each one. For example,
$ get s.abc s.xyz

produces
s.abc:
1.3
67 lines
No id keywords (cm7)

s.xyz: 1.7 85 lines No id keywords (cm7)

get

169

Tracking versions with SCCS


ID keywords
In generating a gfile for compilation, it is useful to record the date and time of creation, the version
retrieved, the module's name, and so on in the gfile itself. This information appears in a load module when
one is eventually created. SCCS provides a convenient mechanism for doing this automatically. Identification
(ID) keywords appearing anywhere in the gfile are replaced by appropriate values according to the
definitions of those ID keywords. The format of an ID keyword is an uppercase letter enclosed by percent
signs (%). For example,
%I%

is the ID keyword replaced by the SID of the retrieved version of a file. Similarly, %H% and %M% are the
date and name of the gfile, respectively. Thus, executing get on an SCCS file that contains the PL/I
declaration
DCL ID CHAR(100) VAR INIT('%M% %I% %H%');

gives (for example) the following:


DCL ID CHAR(100) VAR INIT('MODNAME 2.3 07/18/85');

When no ID keywords are substituted by get, the following message is issued:


No id keywords (cm7)

This message is normally treated as a warning by get although the presence of the i flag in the SCCS file
causes it to be treated as an error. For a complete list of the keywords provided, see get(CP).
Retrieval of different versions
The version of an SCCS file that get retrieves by default is the most recently created delta of the highest
numbered trunk release. However, any other version can be retrieved with get r by specifying the version's
SID. Thus,
$ get r1.3 s.abc

retrieves version 1.3 of s.abc and produces (for example) on the standard output
1.3
64 lines

A branch delta may be retrieved similarly,


$ get r1.5.2.3 s.abc

which produces (for example) on the standard output


1.5.2.3
234 lines

When a SID is specified and the particular version does not exist in the SCCS file, an error message results.
Omitting the level number, as in
get

170

Tracking versions with SCCS


$ get r3 s.abc

causes retrieval of the trunk delta with the highest level number within the given release. Thus, the above
command might output
3.7
213 lines

If the given release does not exist, get retrieves the trunk delta with the highest level number within the
highestnumbered existing release that is lower than the given release. For example, assume release 9 does
not exist in file s.abc and release 7 is the highestnumbered release below 9. Executing
$ get r9 s.abc

would produce
7.6
420 lines

which indicates that trunk delta 7.6 is the latest version of file s.abc below release 9. Similarly, omitting the
sequence number, as in
$ get r4.3.2 s.abc

results in the retrieval of the branch delta with the highest sequence number on the given branch. This might
result in the following output:
4.3.2.8
89 lines

(If the given branch does not exist, an error message results.)
get t will retrieve the latest (top) version of a particular release when no r is used or when its value is
simply a release number. The latest version is the delta produced most recently, independent of its location on
the SCCS file tree. Thus, if the most recent delta in release 3 is 3.5,
$ get r3 t s.abc

would produce
3.5
59 lines

However, if branch delta 3.2.1.5 were the latest delta (created after delta 3.5), the same command might
produce
3.2.1.5
46 lines

Updating source
get e indicates an intent to make a delta. First, get checks the following:

get

171

Tracking versions with SCCS


The user list to determine if the login name or group ID of the person executing get is present. The
login name or group ID must be present for the user to be allowed to make deltas. (See ``admin'' for a
discussion of making user lists.)
The release number (R) of the version being retrieved to determine if the release being accessed is a
protected release. That is, the release number must satisfy the relation
floor is less than or equal to R,
which is less than or equal to ceiling
Floor and ceiling are flags in the SCCS file representing start and end of the range of valid releases.
R is not locked against editing. The lock is a flag in the SCCS file.
Whether multiple concurrent edits are allowed for the SCCS file by the j flag in the SCCS file.
A failure of any of the first three conditions causes the processing of the corresponding SCCS file to
terminate.
If the above checks succeed, get e causes the creation of a gfile in the current directory with mode 644
(readable by everyone, writable only by the owner) that is owned by the real user. If a writable gfile already
exists, get terminates with an error.
Any ID keywords appearing in the gfile are not replaced by get e because the generated gfile is
subsequently used to create another delta.
In addition, get e causes the creation (or updating) of the p.file that is used to pass information to the delta
command.
The following
$ get e s.abc

produces (for example) on the standard output


1.3
new delta 1.4
67 lines

Undoing a get e
There may be times when a file is retrieved accidentally for editing; there is really no editing that needs to be
done at this time. In such cases, the unget command can be used to cancel the delta reservation that was set
up.
Additional get options
If get r and/or t are used together with e, the version retrieved for editing is the one specified with r
and/or t.
get i and x are used to specify a list of deltas to be included and excluded, respectively (see get(CP) for the
syntax of such a list). Including a delta means forcing its changes to be included in the retrieved version. This
is useful in applying the same changes to more than one version of the SCCS file. Excluding a delta means
forcing it not to be applied. This may be used to undo the effects of a previous delta in the version to be
get

172

Tracking versions with SCCS


created.
Whenever deltas are included or excluded, get checks for possible interference with other deltas. Two deltas
can interfere, for example, when each one changes the same line of the retrieved gfile. A warning shows the
range of lines within the retrieved gfile where the problem may exist. The user should examine the gfile to
determine what the problem is and take appropriate corrective steps (edit the file if necessary).
CAUTION: get i and get x should be used with extreme care.

get k is used either to regenerate a gfile that may have been accidentally removed or ruined after get e, or
simply to generate a gfile in which the replacement of ID keywords has been suppressed. A gfile generated
by get k is identical to one produced by get e, but no processing related to p.file takes place.
Concurrent edits of different SID
The ability to retrieve different versions of an SCCS file allows several deltas to be in progress at any given
time. This means that several get e commands may be executed on the same file as long as no two
executions retrieve the same version (unless multiple concurrent edits are allowed).
The p.file created by get e is created in the same directory as the SCCS file, given mode 644 (readable by
everyone, writable only by the owner), and owned by the effective user. It contains the following information
for each delta that is still in progress:
the SID of the retrieved version
the SID given to the new delta when it is created
the login name of the real user executing get
The first execution of get e causes the creation of p.file for the corresponding SCCS file. Subsequent
executions only update p.file with a line containing the above information. Before updating, however, get
checks to assure that no entry already in p.file specifies that the SID of the version to be retrieved is already
retrieved (unless multiple concurrent edits are allowed). If the check succeeds, the user is informed that other
deltas are in progress and processing continues. If the check fails, an error message results.
It should be noted that concurrent executions of get must be carried out from different directories. Subsequent
executions from the same directory will attempt to overwrite the gfile, which is an SCCS error condition. In
practice, this problem does not arise because each user normally has a different working directory. See
``Protection'' for a discussion of how different users are permitted to use SCCS commands on the same files.
``Determination of new SID'' shows the possible SID components a user can specify with get (leftmost
column), the version that will then be retrieved by get, and the resulting SID for the delta, which delta will
create (rightmost column). In the table
R, L, B, and S mean release, level, branch, and sequence numbers in the SID, and m means
maximum. Thus, for example, R.mL means the maximum level number within release R.
R.L.(mB+1).1 means the first sequence number on the new branch (maximum branch number plus 1)
of level L within release R. Note that if the SID specified is R.L, R.L.B, or R.L.B.S, each of these
specified SID numbers must exist.
The b keyletter is effective only if the b flag (see admin(CP)) is present in the file. An entry of
means irrelevant.
get

173

Tracking versions with SCCS


The first two entries in the leftmost column apply only if the d (default SID) flag is not present. If
the d flag is present in the file, the SID is interpreted as if specified on the command line. Thus, one
of the other cases in this figure applies.
R.1 (the third entry in the rightmost column) is used to force the creation of the first delta in a new
release.
hR (the seventh entry in the fourth column) is the highest existing release that is lower than the
specified, nonexistent release R.

Determination of new SID


SID
specified
in get
none
none
R
R
R
R
R
R
R.L
R.L
R.L
R.L.B
R.L.B
R.L.B.S
R.L.B.S
R.L.B.S

b keyletter Other
used
conditions

SID retrieved SID of delta to be


by get
created by delta

no
yes
no
no
yes
yes

mR.mL
mR.mL
mR.mL
mR.mL
mR.mL
mR.mL
hR.mL
R.mL

mR.(mL+1)
mR.mL.(mB+1).1
R.1
mR.(mL+1)
mR.mL.(mB+1).1
mR.mL.(mB+1).1
hR.mL.(mB+1).1
R.mL.(mB+1).1

R.L
R.L
R.L
R.L.B.mS
R.L.B.mS
R.L.B.S
R.L.B.S
R.L.B.S

R.(L+1)
R.L.(mB+1).1
R.L.(mB+1).1
R.L.B.(mS+1)
R.L.(mB+1).1
R.L.B.(S+1)
R.L.(mB+1).1
R.L.(mB+1).1

no
yes
no
yes
no
yes

R defaults to mR
R defaults to mR
R > mR
R = mR
R > mR
R = mR
R< mR and R does not exist
Trunk successor number in release > R and
R exists
No trunk successor
No trunk successor
Trunk successor in release R
No branch successor
No branch successor
No branch successor
No branch successor
Branch successor

Concurrent edits of same SID


Under normal conditions, more than one get e for the same SID is not permitted. That is, delta must be
executed before a subsequent get e is executed on the same SID.
Multiple concurrent edits are allowed if the j flag is set in the SCCS file. Thus:
$ get e s.abc
1.1
new delta 1.2
5 lines

may be immediately followed by


$ get e s.abc
1.1

get

174

Tracking versions with SCCS


new delta 1.1.1.1
5 lines

without an intervening delta. In this case, a delta after the first get will produce delta 1.2 (assuming 1.1 is the
most recent trunk delta), and a delta after the second get will produce delta 1.1.1.1.
Keyletters that affect output
get p causes the retrieved text to be written to the standard output rather than to a gfile. In addition, all
output normally directed to the standard output (such as the SID of the version retrieved and the number of
lines retrieved) is directed instead to the standard error. get p is used, for example, to create a gfile with an
arbitrary name, as in
$ get p s.abc > arbitrary file name

get s suppresses output normally directed to the standard output, such as the SID of the retrieved version and
the number of lines retrieved, but it does not affect messages normally directed to the standard error. get s is
used to prevent nondiagnostic messages from appearing on the user's terminal and is often used with p to
pipe the output, as in
$ get p s s.abc | pg

get g prints the SID on standard output and there is no retrieval of the SCCS file. This is useful in several
ways. For example, to verify a particular SID in an SCCS file
$ get g r4.3 s.abc

outputs the SID 4.3 if it exists in the SCCS file s.abc or an error message if it does not. Another use of get g
is in regenerating a p.file that may have been accidentally destroyed, as in
$ get e g s.abc

get l causes SCCS to create l.file in the current directory with mode 444 (readonly for owner, group, and
other) and owned by the real user. The l.file contains a table (whose format is described on get(CP)). showing
the deltas used in constructing a particular version of the SCCS file. For example
$ get r2.3 l s.abc

generates an l.file showing the deltas applied to retrieve version 2.3 of s.abc. Specifying p with l, as in
$ get lp r2.3 s.abc

causes the output to be written to the standard output rather than to l.file. get g can be used with l to
suppress the retrieval of the text.
get m identifies the changes applied to an SCCS file. Each line of the gfile is preceded by the SID of the
delta that caused the line to be inserted. The SID is separated from the text of the line by a tab character.
get n causes each line of a gfile to be preceded by the value of the %M% ID keyword and a tab character.
This is most often used in a pipeline with grep(C). For example, to find all lines that match a given pattern in
the latest version of each SCCS file in a directory, the following may be executed:

get

175

Tracking versions with SCCS


$ get p n s directory | grep pattern

If both m and n are specified, each line of the gfile is preceded by the value of the %M% ID keyword
and a tab (this is the effect of n) and is followed by the line in the format produced by m.
Because use of m and/or n causes the contents of the gfile to be modified, such a gfile must not be used
for creating a delta. Therefore, neither m nor n may be specified together with get e. See the get(CP)
page.

delta
The delta command is used to incorporate changes made to a gfile into the corresponding SCCS file that
is, to create a delta and, therefore, a new version of the file.
The delta command requires the existence of p.file (created by get e). It examines p.file to verify the
presence of an entry containing the user's login name. If none is found, an error message results.
The delta command performs the same permission checks that get e performs. If all checks are successful,
delta determines what has been changed in the gfile by comparing it with its own temporary copy of the
gfile as it was before editing. This temporary copy is called d.file and is obtained by performing an internal
get on the SID specified in the p.file entry.
The required p.file entry is the one containing the login name of the user executing delta, because the user
who retrieved the gfile must be the one who creates the delta. However, if the login name of the user appears
in more than one entry, the same user has executed get e more than once on the same SCCS file. Then, delta
r must be used to specify the SID that uniquely identifies the p.file entry. This entry is then the one used to
obtain the SID of the delta to be created.
In practice, the most common use of delta is
$ delta s.abc

which prompts
comments?

to which the user replies with a description of why the delta is being made, ending the reply with a newline
character. The user's response may be up to 512 characters long with newlines (not intended to terminate the
response) escaped by backslashes (\).
If the SCCS file has a v flag, delta first prompts with
MRs?

(Modification Requests) on the standard output. The standard input is then read for MR numbers, separated by
blanks and/or tabs, ended with a newline character. A Modification Request is a formal way of asking for a
correction or enhancement to the file. In some controlled environments where changes to source files are
tracked, deltas are permitted only when initiated by a trouble report, change request, trouble ticket, and so on,
collectively called MRs. Recording MR numbers within deltas is a way of enforcing the rules of the change
management process.

delta

176

Tracking versions with SCCS


delta y and/or m can be used to enter comments and MR numbers on the command line rather than through
the standard input, as in
$ delta y"descriptive comment" m"mrnum1 mrnum2" s.abc

In this case, the prompts for comments and MRs are not printed, and the standard input is not read. These two
keyletters are useful when delta is executed from within a shell procedure. Note that delta m is allowed only
if the SCCS file has a v flag.
No matter how comments and MR numbers are entered with delta, they are recorded as part of the entry for
the delta being created. Also, they apply to all SCCS files specified with the delta.
If delta is used with more than one file argument and the first file named has a v flag, all files named must
have this flag. Similarly, if the first file named does not have the flag, none of the files named may have it.
When delta processing is complete, the standard output displays the SID of the new delta (from p.file) and the
number of lines inserted, deleted, and left unchanged. For example:
1.4
14 inserted
7 deleted
345 unchanged
If line counts do not agree with the user's perception of the changes made to a gfile, it may be because there
are various ways to describe a set of changes, especially if lines are moved around in the gfile. However, the
total number of lines of the new delta (the number inserted plus the number left unchanged) should always
agree with the number of lines in the edited gfile.
If you are in the process of making a delta and the delta command finds no ID keywords in the edited gfile,
the message
No id keywords (cm7)

is issued after the prompts for commentary but before any other output. This means that any ID keywords that
may have existed in the SCCS file have been replaced by their values or deleted during the editing process.
This could be caused by making a delta from a gfile that was created by a get without e (ID keywords are
replaced by get in such a case). It could also be caused by accidentally deleting or changing ID keywords
while editing the gfile. Or, it is possible that the file had no ID keywords. In any case, the delta will be
created unless there is an i flag in the SCCS file (meaning the error should be treated as fatal), in which case
the delta will not be created.
After the processing of an SCCS file is complete, the corresponding p.file entry is removed from p.file. All
updates to p.file are made to a temporary copy, q.file, whose use is similar to that of x.file described under
``SCCS command conventions''. If there is only one entry in p.file, then p.file itself is removed.
In addition, delta removes the edited gfile unless n is specified. For example
$ delta n s.abc

will keep the gfile after processing.

delta

177

Tracking versions with SCCS


delta s suppresses all output normally directed to the standard output, other than comments? and MRs?.
Thus, use of s with y (and/or m) causes delta neither to read from the standard input nor to write to the
standard output.
The differences between the gfile and the d.file constitute the delta and may be printed on the standard
output by using delta p. The format of this output is similar to that produced by diff.

admin
The admin command is used to administer SCCS files that is, to create new SCCS files and change the
parameters of existing ones. When an SCCS file is created, its parameters are initialized by use of keyletters
with admin or are assigned default values if no keyletters are supplied. The same keyletters are used to
change the parameters of existing SCCS files.
Two keyletters are used in detecting and correcting corrupted SCCS files (see ``Auditing'').
Newly created SCCS files are given access permission mode 444 (readonly for owner, group and other) and
are owned by the effective user. Only a user with write permission in the directory containing the SCCS file
may use the admin(CP) command on that file.
Creation of SCCS files
An SCCS file can be created by executing the command
$ admin ifirst s.abc

in which the value first with i is the name of a file from which the text of the initial delta of the SCCS file
s.abc is to be taken. Omission of a value with i means admin is to read the standard input for the text of the
initial delta.
The command
$ admin i s.abc < first

is equivalent to the previous example.


If the text of the initial delta does not contain ID keywords, the message
No id keywords (cm7)

is issued by admin as a warning. However, if the command also sets the i flag (not to be confused with the i
keyletter), the message is treated as an error and the SCCS file is not created. Only one SCCS file may be
created at a time using admin i.
admin r is used to specify a release number for the first delta. Thus:
admin ifirst r3 s.abc

means the first delta should be named 3.1 rather than the normal 1.1. Because r has meaning only when
creating the first delta, its use is permitted only with i.

admin

178

Tracking versions with SCCS


Inserting commentary for the initial delta
When an SCCS file is created, the user may want to record why this was done. Comments (admin y) and/or
MR numbers (m) can be entered in exactly the same way as with delta.
If y is omitted, a comment line of the form
date and time created YY/MM/DD HH:MM:SS by logname

is automatically generated.
If it is desired to supply MR numbers (admin m), the v flag must be set with f. The v flag simply
determines whether MR numbers must be supplied when using any SCCS command that modifies a delta
commentary in the SCCS file (see sccsfile(4)). An example would be
$ admin ifirst mmrnum1 fv s.abc

Note that y and m are effective only if a new SCCS file is being created.
Initialization and modification of SCCS file parameters
Part of an SCCS file is reserved for descriptive text, usually a summary of the file's contents and purpose. It
can be initialized or changed by using admin t.
When an SCCS file is first being created and t is used, it must be followed by the name of a file from which
the descriptive text is to be taken. For example, the command
$ admin ifirst tdesc s.abc

specifies that the descriptive text is to be taken from file desc.


When processing an existing SCCS file, t specifies that the descriptive text (if any) currently in the file is to
be replaced with the text in the named file. Thus:
$ admin tdesc s.abc

specifies that the descriptive text of the SCCS file is to be replaced by the contents of desc. Omission of the
filename after the t keyletter as in
$ admin t s.abc

causes the removal of the descriptive text from the SCCS file.
The flags of an SCCS file may be initialized or changed by admin f, or deleted by admin d.
SCCS file flags are used to direct certain actions of the various commands. (See the admin(CP) page for a
description of all the flags.) For example, the i flag specifies that a warning message (stating that there are no
ID keywords contained in the SCCS file) should be treated as an error. The d (default SID) flag specifies the
default version of the SCCS file to be retrieved by the get command.
admin f is used to set flags and, if desired, their values. For example

admin

179

Tracking versions with SCCS


$ admin ifirst fi fmmodname s.abc

sets the i and m (module name) flags. The value modname specified for the m flag is the value that the get
command will use to replace the %M% ID keyword. (In the absence of the m flag, the name of the gfile is
used as the replacement for the %M% ID keyword.) Several f keyletters may be supplied on a single
admin, and they may be used whether the command is creating a new SCCS file or processing an existing
one.
admin d is used to delete a flag from an existing SCCS file. As an example, the command
$ admin dm s.abc

removes the m flag from the SCCS file. Several d keyletters may be used with one admin and may be
intermixed with f.
SCCS files contain a list of login names and/or group IDs of users who are allowed to create deltas. This list is
empty by default, allowing anyone to create deltas. To create a user list (or add to an existing one), admin a
is used. For example,
$ admin axyz awql a1234 s.abc

adds the login names xyz and wql and the group ID 1234 to the list. admin a may be used whether creating a
new SCCS file or processing an existing one.
admin e (erase) is used to remove login names or group IDs from the list.

prs
The prs command is used to print all or part of an SCCS file on the standard output. If prs d is used, the
output will be in a format called data specification. Data specification is a string of SCCS file data keywords
(not to be confused with get ID keywords) interspersed with optional user text.
Data keywords are replaced by appropriate values according to their definitions. For example,
:I:

is defined as the data keyword replaced by the SID of a specified delta. Similarly, :F: is the data keyword for
the SCCS filename currently being processed, and :C: is the comment line associated with a specified delta.
All parts of an SCCS file have an associated data keyword. For a complete list, see the prs(CP) page.
There is no limit to the number of times a data keyword may appear in a data specification. Thus, for example,
$ prs d":I: this is the top delta for :F: :I:" s.abc

may produce on the standard output


2.1 this is the top delta for s.abc 2.1

Information may be obtained from a single delta by specifying its SID using prs r. For example,
$ prs d":F:: :I: comment line is: :C:" r1.4 s.abc

prs

180

Tracking versions with SCCS


may produce the following output:
s.abc: 1.4 comment line is: THIS IS A COMMENT

If r is not specified, the value of the SID defaults to the most recently created delta.
In addition, information from a range of deltas may be obtained with l or e. The use of prs e substitutes
data keywords for the SID designated with r and all deltas created earlier, while prs l substitutes data
keywords for the SID designated with r and all deltas created later. Thus, the command
$ prs d:I: r1.4 e s.abc

may output
1.4
1.3
1.2.1.1
1.2
1.1

and the command


$ prs d:I: r1.4 l s.abc

may produce
3.3
3.2
3.1
2.2.1.1
2.2
2.1
1.4

Substitution of data keywords for all deltas of the SCCS file may be obtained by specifying both e and l.

sact
sact is a special form of the prs command that produces a report about files that are out for edit. The
command takes only one type of argument: a list of file or directory names. The report shows the SID of any
file in the list that is out for edit, the SID of the impending delta, the login of the user who executed the get e
command, and the date and time the get e was executed. It is a useful command for an administrator.

help
The help command prints information about messages that may appear on the user's terminal. Arguments to
help are the code numbers that appear in parentheses at the end of SCCS messages. (If no argument is given,
help prompts for one.) Explanatory information is printed on the standard output. If no information is found,
an error message is printed. When more than one argument is used, each is processed independently, and an
error resulting from one will not stop the processing of the others. For more information, see the help(CP)
page.

sact

181

Tracking versions with SCCS

rmdel
The rmdel command allows removal of a delta from an SCCS file. Its use should be reserved for deltas in
which incorrect global changes were made. The delta to be removed must be a leaf delta. That is, it must be
the most recently created delta on its branch or on the trunk of the SCCS file tree. In ``Extended branching
concept'', only deltas 1.3.1.2, 1.3.2.2, and 2.2 can be removed. Only after they are removed can deltas 1.3.2.1
and 2.1 be removed.
To be allowed to remove a delta, the effective user must have write permission in the directory containing the
SCCS file. In addition, the real user must be either the one who created the delta being removed or the owner
of the SCCS file and its directory.
The r keyletter is mandatory with rmdel. It is used to specify the complete SID of the delta to be removed.
Thus
$ rmdel r2.3 s.abc

specifies the removal of trunk delta 2.3.


Before removing the delta, rmdel checks that the release number (R) of the given SID satisfies the relation
floor is less than or equal to R,
which is less than or equal to ceiling
Floor and ceiling are flags in the SCCS file representing start and end of the range of valid releases.
The rmdel command also checks the SID to make sure it is not for a version on which a get for editing has
been executed and whose associated delta has not yet been made. In addition, the login name or group ID of
the user must appear in the file's user list (or the user list must be empty). Also, the release specified cannot be
locked against editing. That is, if the l flag is set (see admin(CP)), the release must not be contained in the
list. If these conditions are not satisfied, processing is terminated, and the delta is not removed.
Once a specified delta has been removed, its type indicator in the delta table of the SCCS file is changed from
D (delta) to R (removed).

cdc
The cdc command is used to change the commentary made when the delta was created. It is similar to the
rmdel command (for example, r and full SID are necessary), although the delta need not be a leaf delta. For
example,
$ cdc r3.4 s.abc

specifies that the commentary of delta 3.4 is to be changed. New commentary is then prompted for as with
delta.
The old commentary is kept, but it is preceded by a comment line indicating that it has been superseded, and
the new commentary is entered ahead of the comment line. The inserted comment line records the login name
of the user executing cdc and the time of its execution.

rmdel

182

Tracking versions with SCCS


The cdc command also allows for the insertion of new and deletion of old MR numbers with the ! symbol.
Thus
cdc r1.4 s.abc
MRs? mrnum3 !mrnum1 (The MRs? prompt appears only
if the v flag has been set.)
comments? deleted wrong MR no. and inserted correct MR no.
inserts mrnum3 and deletes mrnum1 for delta 1.4.

what
The what command is used to find identifying information in any UNIX system file whose name is given as
an argument. No keyletters are accepted. The what command searches the given file(s) for all occurrences of
the string @(#), which is the replacement for the %Z% ID keyword (see the get(CP) page). It prints on the
standard output whatever follows the string until the first double quote (``"''), greater than symbol (>),
backslash (\), newline, null, or nonprinting character.
For example, if an SCCS file called s.prog.c (a C language source file) contains the following line
char id[]= "%W%";

and the command


$ get r3.4 s.prog.c

is used, the resulting gfile is compiled to produce prog.o and a.out. Then, the command
$ what prog.c prog.o a.out

produces
prog.c:
prog.c:
prog.o:
prog.c:
a.out:
prog.c:

3.4
3.4
3.4

The string searched for by what need not be inserted with an ID keyword of get; it may be inserted in any
convenient manner.

sccsdiff
The sccsdiff command determines (and prints on the standard output) the differences between any two
versions of an SCCS file. The versions to be compared are specified with sccsdiff r in the same way as with
get r. SID numbers must be specified as the first two arguments. The SCCS file or files to be processed are
named last. Directory names and a lone hyphen are not acceptable to sccsdiff.
The following is an example of the format of sccsdiff:
$ sccsdiff r3.4 r5.6 s.abc

what

183

Tracking versions with SCCS


The differences are printed the same way as by diff.

comb
The comb command lets the user reduce the size of an SCCS file. It generates a shell procedure on the
standard output, which reconstructs the file by discarding unwanted deltas and combining other specified
deltas. (It is not recommended that comb be used as a matter of routine.)
In the absence of any keyletters, comb preserves only leaf deltas and the minimum number of ancestor deltas
necessary to preserve the shape of an SCCS tree. The effect of this is to eliminate middle deltas on the trunk
and on all branches of the tree. Thus, in ``Extended branching concept'', deltas 1.2, 1.3.2.1, 1.4, and 2.1 would
be eliminated.
Some of the keyletters used with this command are:

comb s
This option generates a shell procedure that produces a report of the percentage space (if any) the user
will save. This is often useful as a preliminary check.
comb p
This option is used to specify the oldest delta the user wants preserved.
comb c
This option is used to specify a list (see the get(CP) page for its syntax) of deltas the user wants
preserved. All other deltas will be discarded.
The shell procedure generated by comb is not guaranteed to save space. A reconstructed file may even be
larger than the original. Note, too, that the shape of an SCCS file tree may be altered by the reconstruction
process.

val
The val command is used to determine whether a file is an SCCS file meeting the characteristics specified by
certain keyletters. It checks for the existence of a particular delta when the SID for that delta is specified with
r.
The string following y or m is used to check the value set by the t or m flag, respectively. See admin(CP)
for descriptions of these flags.
The val command treats the special argument hyphen differently from other SCCS commands. It allows val to
read the argument list from the standard input instead of from the command line, and the standard input is
read until an endoffile (controld) is entered. This permits one val command with different values for
keyletters and file arguments. For example,
$ val
yc mabc s.abc
mxyz ypl1 s.xyz
control_d

comb

184

Tracking versions with SCCS


first checks if file s.abc has a value c for its type flag and value abc for the module name flag. Once this is
done, val processes the remaining file, in this case s.xyz.
The val command returns an 8bit code. Each bit set shows a specific error (see val(CP) for a description of
errors and codes). In addition, an appropriate diagnostic is printed unless suppressed by s. A return code of 0
means all files met the characteristics specified.

SCCS files
This section covers protection mechanisms used by SCCS, the format of SCCS files, and the recommended
procedures for auditing SCCS files.

Protection
SCCS relies on the capabilities of the UNIX system for most of the protection mechanisms required to
prevent unauthorized changes to SCCS files that is, changes by nonSCCS commands. Protection features
provided directly by SCCS are the release lock flag, the release floor and ceiling flags, and the user list.
Files created by the admin command are given access permission mode 444 (readonly for owner, group, and
other). This mode should remain unchanged because it (generally) prevents modification of SCCS files by
nonSCCS commands. Directories containing SCCS files should be given mode 755, which allows only the
owner of the directory to modify it.
SCCS files should be kept in directories that contain only SCCS files and any temporary files created by
SCCS commands. This simplifies their protection and auditing. The contents of directories should be logical
groupings subsystems of the same large project, for example.
SCCS files should have only one link (name) because commands that modify them do so by creating and
modifying a copy of the file. When processing is done, the contents of the old file are automatically replaced
by the contents of the copy, whereupon the copy is destroyed. If the old file had additional links, this would
break them. Then, rather than process such files, SCCS commands would produce an error message.
When only one person uses SCCS, the real and effective user IDs are the same; and the user ID owns the
directories containing SCCS files. Therefore, SCCS may be used directly without any preliminary
preparation.
When several users with unique user IDs are assigned SCCS responsibilities (on large development projects,
for example), one user that is, one user ID must be chosen as the owner of the SCCS files. This person
will administer the files (use the admin command) and will be SCCS administrator for the project. Because
other users do not have the same privileges and permissions as the SCCS administrator, they are not able to
execute directly those commands that require write permission in the directory containing the SCCS files.
Therefore, a projectdependent program is required to provide an interface to the get, delta, and, if desired,
rmdel and cdc commands.
The interface program must be owned by the SCCS administrator and must have the
setuserIDonexecution bit on (see chmod(C)). This assures that the effective user ID is the user ID of the
SCCS administrator. With the privileges of the interface program during command execution, the owner of an
SCCS file can modify it at will. Other users whose login names or group IDs are in the user list for that file
(but are not the owner) are given the necessary permissions only for the duration of the execution of the
interface program. Thus, they may modify SCCS only with delta and, possibly, rmdel and cdc.
SCCS files

185

Tracking versions with SCCS

Formatting
SCCS files are composed of lines of ASCII text arranged in six parts as follows:

Checksum
a line containing the logical sum of all the characters of the file (not including the checksum line
itself)
DeltaTable
information about each delta, such as type, SID, date and time of creation, and commentary
UserNames
list of login names and/or group IDs of users who are allowed to modify the file by adding or
removing deltas
Flags
indicators that control certain actions of SCCS commands
Descriptive Text
usually a summary of the contents and purpose of the file
Body
the text administered by SCCS, intermixed with internal SCCS control lines
Details on these file sections may be found in sccsfile(4). The checksum line is discussed in ``Auditing''.
Because SCCS files are ASCII files they can be processed by nonSCCS commands like ed, grep, and cat.
This is convenient when an SCCS file must be modified manually (a delta's time and date were recorded
incorrectly, for example, because the system clock was set incorrectly), or when a user wants simply to look at
the file.
CAUTION:
Extreme care should be exercised when modifying SCCS files with nonSCCS commands.

Auditing
When a system or hardware malfunction destroys an SCCS file, any command will issue an error message.
Commands also use the checksum stored in an SCCS file to determine whether the file has been corrupted
because it was last accessed (possibly by having lost one or more blocks or by having been modified with ed).
No SCCS command will process a corrupted SCCS file except the admin h or z, as described below.
SCCS files should be audited for possible corruptions on a regular basis. The simplest and fastest way to do
an audit is to use admin h and specify all SCCS files:
admin h s.file1 s.file2

. . .

or
Formatting

186

Tracking versions with SCCS


admin h directory1 directory2

. . .

If the new checksum of any file is not equal to the checksum in the first line of that file, the message
corrupted file (co6)

is produced for that file. The process continues until all specified files have been examined. When examining
directories (as in the second example above), the checksum process will not detect missing files. A simple
way to learn whether files are missing from a directory is to execute the ls command periodically, and
compare the outputs. Any file whose name appeared in a previous output but not in the current one no longer
exists.
When a file has been corrupted, the way to restore it depends on the extent of the corruption. If damage is
extensive, the best solution is to contact the local UNIX system operations group and request that the file be
restored from a backup copy. If the damage is minor, repair through editing may be possible. After such a
repair, the admin command must be executed:
$ admin z s.file

The purpose of this is to recompute the checksum and bring it into agreement with the contents of the file.
After this command is executed, any corruption that existed in the file will no longer be detectable.

Formatting

187

Packaging your software applications


NOTE: This topic describes the pkgadd family of package installation software that is the default package
format on SCO UnixWare. While this format is also supported on SCO OpenServer, the primary installation
format for SCO OpenServer is the Custom Distribution Mastering Toolkit (CDMT) format. For more
information on CDMT, see the custom(ADM) and Intro(CDMT) manual pages.

This topic describes how to package software that will be installed on computers running SCO Openserver or
UnixWare. A packaging tool, the pkgmk(C) command, is provided to help automate package creation. It
gathers the components of a package on the development machine, copies them onto the installation medium,
and places them into a structure that the installation tool, pkgadd(ADM), recognizes.
This topic also describes the pkgadd(ADM) command, which copies the package from the installation
medium onto a system and performs system housekeeping routines that concern the package. This tool is
primarily for the installer but is described here to provide you with a background on the environment into
which your packages will be placed and to help you testinstall packages.
The first two sections describe what a package consists of and gives an overview of the structural life cycle of
a package (how its structure on your development machine relates to its structure on the installation medium
and on the installation machine).
The remaining sections familiarize you with the tools, files, and scripts involved in creating a package,
provide suggestions for how to approach software packaging, and describe some specific procedures.
The section on set packaging describes how you can collect an arbitrary number of packages into a single
installable image (the ``set'') for installation on the target machine.
After reading this topic, you should study ``Case studies of package installation'', which provides case studies
using the tools and techniques described in this topic.

Contents of a package
A software package is a group of components that together create the software. These components naturally
include the executables that comprise the software, but they also include at least two information files and can
optionally include other information files and scripts.
As shown in Figure 81, a package's contents fall into three categories:
required components (the pkginfo(F) file, the prototype(F) file, package objects)
optional package information files
optional packaging scripts

Packaging your software applications

188

Packaging your software applications

The contents of a package

Required components
A package must contain at least the following components:

Package objects
These are the objects that make up the software. They can be files (executable or data), directories, or
named pipes. Objects can be manipulated in groups during installation by placing them into classes.
You will learn more about classes in ``3. Placing objects into classes''.
The pkginfo(F) file
This required package information file defines parameter values that describe a package. For example,
this file defines values for the package abbreviation, the full package name, and the package
architecture.
The prototype(F) file
This required package information file lists the contents of the package. There is one entry for each
deliverable object consisting of several fields of information that describes the object. All package
components, including the pkginfo(F) file, must be listed in the prototype(F) file.
Both required package information files are described further in ``The package information files'' and on their
respective manual pages.

Optional package information files


The four optional package information files that you can add to your package are:

compver(F)
Required components

189

Packaging your software applications


defines previous versions of the package that are compatible with this version
depend(F)
defines any software dependencies associated with this package
space(F)
defines disk space requirements for the target environment beyond that used by objects defined in the
prototype(F) file (for example, files that will be dynamically created at installation time)
copyright(F)
defines the text for a copyright message that will be printed on the terminal at the time of package
installation or removal
Every package information file used must have an entry in the prototype file. All of these files are described
further in ``The package information files'' and on their respective manual pages.

Optional installation scripts


Your package can use three types of installation scripts, although no scripts are required. Many of the tasks
executed in a preUNIX System V Release 4 installation script are now accomplished automatically by
pkgadd(ADM). However, you can use scripts with an SCO OpenServer package to perform customized
actions. An installation script must be executable by sh (for example, a shell script or executable program).
The three script types are the request script (solicits installer input), class action script (defines a set of
actions to perform on a group of objects), and the procedure script (defines actions that will occur at particular
points during installation).
Packaging scripts are described in detail in ``The installation scripts''. Example scripts can be found in the
case studies.

Quick steps to packaging


This section shows you how to create a minimal package image for installation on an SCO OpenServer
system.
The procedure takes some short cuts and omits optional package components in a tradeoff between ease of
prototyping and functionality. It also provides pointers to further information where such short cuts and
omissions are made.
The outcome of the procedure is a package image that can be added to an SCO OpenServer system using the
pkgadd(ADM) or pkginstall(ADM) commands, or the scoadmin(ADM) application installer.
1. Create a package staging area on a development machine.
This is where you will collect all the files you want to install on the target system, the package control
scripts required by pkgadd(ADM), and the information files used as input to the pkgmk(C)
command. It can be located anywhere you have permission to create a directory, including your home
directory.
Let's assume for this procedure that we have the following files that we want to package for
installation on a target system under the /usr/local directory:
Optional installation scripts

190

Packaging your software applications


two userlevel binaries, cmd1 and cmd2
manual pages for each of the binaries
For this procedure, assume we'll use the directory /home/user1/pkgarea as our packaging area, and
that it looks like this:
/home/user1/pkgarea/
pkginfo
prototype
other information files
control scripts
/src
files to be packaged

This is only one possible, and very simple, arrangement. Files can, in fact, be gathered from anywhere
on your development system, but in many cases using a packaging area such as the one suggested
above simplifies the process of creating a package image and maintaining a backup of what was
packaged for future reference.
2. Copy your source files to /home/user1/pkgarea/src, recreating the target directory structure;
make sure the permissions, owners, and groups are as you want them to be on the target system.
For the purpose of this procedure, we'll assume this structure under /home/user1/pkgarea/src:
/home/user1/pkgarea/src
/bin
cmd1
cmd2
/man
/man1
cmd1.1
cmd2.1

This directory structure mimics the directory structure that we want under the target install directory
(/usr/local).
This step is not required, and might be impractical for a large number of files or even a small set of
very large files. However, if all your files will be installed under a common target directory, copying
them to the staging area can simplify the creation of the prototype file later in the procedure.
It is possible to gather source files during the build process from multiple locations on the system. To
do this, you need to provide a custom prototype file and use appropriate arguments to the pkgproto
and pkgmk commands, as we will discuss later in this procedure.
3. Create package control scripts and information files in (or copy them to) the packaging area.
The files required to create a package are the prototype(F) and pkginfo(F) information files.
We'll create the prototype file in the next step.
The pkginfo file tells the system about your package, and must contain values for the five required
installation parameters, as in this example:
PKG=testpkg
NAME='Test Package'
ARCH='i386'

Optional installation scripts

191

Packaging your software applications


VERSION='Release 1.1'
CATEGORY='application'

The most important of these is PKG, which must be a unique identifier for your package (the PKG
variable's value is displayed by the pkginfo(C) command as the package name). It is recommended
that you define a PKG value between 3 and 9 characters, using alphanumeric characters (az, AZ,
09) and the special characters dot (.), hyphen (), and underscore (_) only. See pkginfo(F) for a
description of these and other parameters that you can define in the pkginfo file; applicationspecific
parameters may also be included.
No other files are required by pkgmk(C), but the particular needs of your installation may require
them. [See ``Script processing'' for the sequence of script processing during software installation and
removal.]
For example, if your application requires an interactive install, you'll probably need a request script to
interact with the user, and a preinstall script to do any work necessary before the installation of the
application's files. If any configuration needs to be done once your source files have been copied to
the target system, then you'll need to provide a postinstall script that performs the configuration.
The best way to determine which files you need is to:
Read the descriptions of the various information files later in this chapter under ``The
installation scripts''.
Look in the /var/sadm/pkg directory, which contains one directory for each installed package
on your system. The directory names correspond to the PKG variable of the package, as
reported by pkginfo(C). Each of these directories contains copies of the package information
files and the control scripts used during package installation (the control scripts are located in
a subdirectory named install). Use the pkginfo command to locate a similar application on
your system, and use that package's files as examples.
4. In packaging area, execute the pkgproto(C) command to create an initial prototype file.
If we assume:
we've populated the src directory in our packaging area with all the directories and files we
need into install on the target
all the files will be installed under /usr/local
We can use a rather simple pkgproto command:
pkgproto src=/usr/local > prototype

If you need to package files from outside the packaging area (rather than having copied them into the
src directory, or if you want to install files to various places on the target machine, you'll need a more
complex pkgproto command.
For example, let's assume there's a very large binary that we want to put in the package and it's
located on the development system at /build/build39/cmd2bld/cmd2. To include this file in the
prototype file without copying it to the packaging area, we could use a pkgproto command like this:
pkgproto /build/build39/cmd2bld=/usr/local/bin src=/var/opt > prototype

Optional installation scripts

192

Packaging your software applications


Note that you can create the prototype file with any editor, you do not have to use pkgproto. See
prototype(F).
5. Edit the prototype file and add a line for each of the package information and control scripts
that your package needs.
For example, if our installation required postinstall and postremove scripts, and we placed these files
at the top of our packaging area (where we'll execute the pkgmk command), then the following
entries in the prototype file would be required to include them in the package:
i pkginfo=pkginfo
i postinstall=postinstall
i postremove=postremove

For testpkg, we're using only the required pkginfo file.


6. Check the permissions on each line of the prototype file and change them, if necessary, to reflect
the permissions, owner, and group that you want on the files once they are installed on the
target system.
At this point, you may also want to add installation classes (to selectively install groups of files based
on system conditions or user input from a request script) or make certain files relocatable on
installation. These subjects are covered in detail in this chapter, in these sections:
``3. Placing objects into classes''
``The class action script''
``The special system classes''
``1. Selective installation''
``4. Making package objects relocatable''
7. At the top level of the packaging area, execute pkgmk:
cd /home/user1/pkgarea
pkgmk [o] d path testpkg

This places a file system format package image in the directory named by path. Use the o option to
overwrite a previously created package image, if necessary.
8. Convert the package to datastream format, if desired.
If you plan to make the package image available for download or transfer to another system, you may
want to use the pkgtrans command to convert the file system image you just created to a datastream
image. A datastream image is an identical, singlefile, ASCIIformat version of the file system
image, and so is typically easier to transfer than a file system image (which might contain many files
and directories).
The example command below places a datastream image of the package in /var/spool/pkg/testpkg.ds:
pkgtrans s path /var/spool/pkg/testpkg.ds

If you wanted to perform the reverse operation, creating a file system image from a datastream image,
use a command like this example:
Optional installation scripts

193

Packaging your software applications


pkgtrans /var/spool/pkg/testpkg.ds existing_dir testpkg

The above command places a file system format image of the package under existing_dir/testpkg.
9. Install the package on a target system for installation testing.
To install the ``file system image'' we created above, enter:
pkgadd d path testpkg

To install the ``datastream image'' we created above, enter:


pkgadd d /var/spool/pkg/testpkg.ds testpkg

[Note that when using datastreams, the name of the file containing the datastream image (testpkg.ds)
is considered part of the device name. This is necessary because a datastream image can contain any
number of packages or sets. A file system format image can contain only one package or set.]
10. Verify that the package was successfully installed. Use the pkginfo command on the target system:
pkginfo l testpkg

Which should return the contents of the package's pkginfo file, as well as the date and time the
package was installed on the target, and other information.
11. Check the list of files and their attributes as actually installed on the target system.
To list the pathnames of all files installed by a package, one per line, use the pkgchk(ADM)
command:
pkgchk v testpkg

To get a detailed listing of installed files and directories, including their contents and attributes, use:
pkgchk l testpkg | pg

The output can reveal problems with the package information files and control scripts, such as
incorrectly specified entries in the prototype file.
12. Remove the package to test package removal. Use the pkgrm(ADM) command:
pkgrm testpkg

The pkginfo command should now return no information for the package:
# pkginfo testpkg
UX:pkginfo: ERROR: information for "testpkg" was not found

See ``Quick steps to network installation'' for how to set up a network install server to offer your package for
installation on remote SCO OpenServer systems.

Quick steps to network installation


Below are the basic steps to follow to set up a network installation server from the command line.

Quick steps to network installation

194

Packaging your software applications


A package image can also be made available for network installation through the Install Server graphical
installation tool
1. Put something in the network installation distribution area (/var/spool/dist) to offer for network
install.
For example, you can copy a package image from a development area on your local hard disk, a
network drive, or from a mounted CD. The following example mounts a CD and copies a file system
format package image from it to /var/spool/dist.
mount F cdfs r /dev/cdrom/cdrom1 /mnt
cp r /mnt/pkgname /var/spool/dist

If the image on CD is a datastream image, you can omit the r option to the cp command, and specify
the name of the file containing the datastream as pkgname.
If you need to change the format of the installable image, you can copy and change it's format in one
command using the pkgtrans(C) command. To go from datastream to file system format:
pkgtrans /mnt/datastream /var/spool/dist pkgname

To go from file system to datastream format:


pkgtrans s /mnt /var/spool/dist/pkgname.ds pkgname

NOTE: You will most often want to use file system format for the packages you place under
/var/spool/dist, so that remote users can find out what packages are available for installation from the
server without having to know any file names in advance. Users need to know the name of a
datastream file in order to install from it. If you place a datastream under /var/spool/dist, it's name will
not be listed automatically by remote installers using either the pkglist or pkginstall commands, or by
the Application Installer interface. See ``Network installation from the command line'' and
``Network installation from the graphical interface'' for more explanation.
2. Enable the installation server, specifying the appropriate network protocol (tcp or spx).
For example the following command enables the network installation of all the packages under
/var/spool/dist using the tcp protocol.
installsrv e n tcp

Note that the appropriate protocol must be enabled on your system, which must also be connected to a
network of the appropriate type in order for it to be able to process nonlocal installation requests.
3. To ensure your server is configured properly, use the pkglist(ADM) command to list the file system
format packages it is offering for network install from /var/spool/dist:
pkglist s 0.0.0.0: all

For datastream format images, use a command like the following, substituting the name of a
datastream format file under /var/spool/dist for datastream:
pkglist s 0.0.0.0:datastream all

Quick steps to network installation

195

Packaging your software applications


See ``Network installation from the command line'' and ``Network installation from the graphical interface''
for how to perform a network install from a target machine on the network.

Network installation from the command line


The pkginstall(ADM) command is used from the command line on a target system to install software from an
SCO OpenServer network install server (the other installation command, pkgadd(ADM), is used to install
software from local media such as a CD or disk drive).
The simplest way for users to use the pkginstall command is to:
1. List the available packages on the server.
For example:
pkglist s server: all

Note the required trailing colon (:) on the server name.


For example, if you set up an install server on your local machine following the instructions under
``Quick steps to network installation'', you can list all the packages available for installation under
/var/spool/dist that are in file system format using this command:
pkglist s 0.0.0.0: all

If the package is in a datastream format file on the server, you'll need to know the name of the file to
list the packages contained in it (more than one package can be in a single datastream file). For
example, to list all the packages on a server in a datastream format file named package.ds, enter:
pkglist s server:/var/spool/dist/package.ds all

2. Enter the appropriate pkginstall command. For example, to install all the packages from the server
that are in file system format under /var/spool/dist, enter:
pkginstall s server all

To install a single file system format package, enter:


pkginstall s server pkgname

To install all the packages in a single datastream format file, enter:


pkginstall s server:/var/spool/dist/filename all

To install a one package in a single datastream format file, enter:


pkginstall s server:/var/spool/dist/filename pkgname

Network installation from the graphical interface


The methods used to install software from a network install server using the graphical Application Installer
are described in Installing and removing software.
Network installation from the command line

196

Packaging your software applications


The important point when setting up an install server regarding the graphical interface is the same as for the
command line: file system format packages can be listed automatically by remote users querying the install
server, while datastream format packages require that the user know the datastream file's name.

The structural life cycle of a package


The material covered in this topic talks about package object pathnames. While reading, keep in mind that a
package object resides in three places while being packaged and installed. To help you avoid confusion,
consider which of the three possible locations are being discussed:

On a development machine
Packages originate on a development machine. They can be in the same directory structure on your
machine as they will be placed on the installation machine. pkgmk(C) can also locate components on
the development machine and give them different pathnames on the installation machine.
On the installation media
When pkgmk copies the package components from the development machine to the installation
medium, it places them into the structure you defined in your prototype(F) file and a format that
pkgadd(ADM) recognizes.
On the installation machine
pkgadd copies a package from the installation medium and places it in the structure defined in your
prototype file. Package objects can be defined as relocatable, meaning the installer can define the
actual location of these package objects on the installation machine during installation. Objects with
fixed locations are copied to their predefined path.

The package creation tools


The packaging tools are provided to automate package creation and to remove the burden of packaging from
the developer. The three packaging tools are:
pkgmk(C) creates a package image in directory structure format from the components of a package
on the development machine.
pkgtrans(C) translates an installable package from one package format to another. The two format
types are directory structure and datastream. For example, after having used pkgmk to create a
package in directory structure format, you might use pkgtrans to translate it into datastream format.
pkgproto(C) generates a prototype(F) file based on the directory structure of your development area.

pkgmk
The pkgmk(C) command takes all of the package objects residing on the development machine, optionally
compresses them, copies them onto the installation medium, and places them into a fixed directory structure.
You are not required to know the details of the fixed directory structure because pkgmk takes care of the
formatting.
Files can be unstructured on the development machine and pkgmk will structure them correctly on the
medium based on information supplied in the prototype(F) file. The installation medium onto which a
package is formatted can be removable (a disk, for example) or it can be a directory on a machine.
The structural life cycle of a package

197

Packaging your software applications


pkgmk requires the presence of two information files on the development machine, the prototype(F) and the
pkginfo(F) files (other package information files might be present). The pkginfo file defines the values for a
number of package parameters, such as the package abbreviation and the package name. The prototype file
provides a complete list of the package contents. pkgmk creates the pkgmap(F) file, the package contents file
on the installation medium, by processing the prototype file and then adding three fields to each entry.
pkgmk follows these steps when processing a package:
1. Processes all of the command lines in the input prototype file. prototype command lines can tell
pkgmk where to look for package objects, merge other prototype files into this one, define default
mode owner group for package objects, and place parameter values in the packaging environment.
2. Copies the objects of a package onto the installation medium, using the prototype file as a listing of
contents. If desired, the objects placed on the installation medium can be compressed.
3. Puts the package objects into the proper format.
4. Divides a package into pieces and distributes those pieces on multiple volumes, if necessary.
5. Creates the pkgmap file. (the content listing file that is placed on the installation medium). It
resembles the prototype file except that all command lines are processed, and the volno, size, cksum,
and modtime fields are added to each entry.

pkgtrans
The pkgtrans(C) command translates a package already created with pkgmk(C) from one package format to
another. It can make the following translations:
a fixed directory structure to a datastream
a datastream to a fixed directory structure
a fixed directory structure to a fixed directory structure
A package in a fixed directory structure can be in a directory on disk (for example, in a spooling directory) or
on a removable device such as a diskette. A datastream can be on any device; for example, on a disk or a tape.

pkgproto
The pkgproto(C) command generates a prototype(F) file. It scans the paths specified on the command line
and creates description line entries for these paths. If the pathname is a directory, an entry for each object in
the directory is generated. You can use the c option of the pkgproto command to place objects into a
particular class.
When you create a prototype(F) file with an editor, it does not matter how package components are organized
on your development machine. You use the path1=path2 pathname format to define where the files reside on
your development machine and where they should be placed on the installation machine. However, when you
use pkgproto to create your file, your development area must be structured exactly as you want your package
to be structured.

The installation tools


The installation tools provide capabilities to install and remove packages, create responses to prompts during
installation of packages, check the accuracy of installed packages, and display information about software
packages. These tools are introduced to you here so that you can understand the environment into which your
pkgtrans

198

Packaging your software applications


package will be placed. The installation tools are:
pkgadd(ADM) installs a package.
pkgrm(ADM) removes a package.
pkgask(ADM) stores answers to an interactive package (one with a request script) in a response file.
Later, when installing the package, this file may be specified on the pkgadd(ADM) command line so
that the package can be installed in noninteractive mode.
pkgchk(ADM) checks the content and attribute information for an installed package to ensure that it
was not corrupted during installation.
pkginfo(C) and pkgparam(C) display information about packages.
The system administrator can set parameters that control various aspects of installation in the administration
file called the admin(F) file. Refer to the manual pages for more information on these commands and on the
admin file.

The package information files


These six package information files can be created using any editor in the file formats described.
The six package information files are:
pkginfo(F)
prototype(F)
compver(F)
copyright(F)
depend(F)
space(F)
This section also describes the systemgenerated pkgmap(F) file, which pkgmk(C) creates and places on the
installation medium. It is similar to the prototype file.

pkginfo
This required package information file defines parameter values that describe characteristics of the package,
such as the package abbreviation, full package name, package version, and package architecture. The
definitions in this file can set values for all of the installation parameters defined in the pkginfo(F) manual
page.
Each entry in the file uses the following format to establish the value of a parameter:
PARAM="value"

Here is an example of a pkginfo file:


PKG="pkgA"
NAME="My Package A"
ARCH="i386"
RELEASE="4.0"
VERSION="2"
VENDOR="MYCOMPANY"
HOTLINE="1800677BUGS"
VSTOCK="0122c3f5566"

The package information files

199

Packaging your software applications


CATEGORY="application"
ISTATES="S 2"
RSTATES="S 2"

The pkginfo(C) and pkgparam(C) commands can be used to access information in a pkginfo file.
NOTE: Before defining the PKG, ARCH, and VERSION parameters, you need to know how
pkgadd(ADM) defines a package instance and the rules associated with naming a package. Refer to ``2.
Defining a package instance'' before assigning values to these parameters.

prototype
This required package information file, prototype(F), contains a list of the package contents. The pkgmk(C)
command uses this file to identify the contents of a package and its location on the development machine
when building the package.
You can create this file in two ways. As with all the package information files, you can use an editor to create
a file named prototype. It should contain entries following the description given below. You can also use the
pkgproto(C) command to generate the file automatically. To use the second method, you must have a copy of
your package on your development machine that is structured exactly as you want it structured on the
installation machine and all modes and permissions must be correct. If you are not going to use pkgproto,
you do not need a structured copy of your package.
The two types of entries in the prototype file are description lines and command lines.
The description lines
For each deliverable object, you must create one description line that consists of several fields describing the
object. This entry describes such information as mode, owner, and group for the object. You can also use this
entry to accomplish the following tasks:
You can override the pkgmk(C) command's placement of an object on a multiplepart package. See
``10. Distributing packages over multiple volumes'' for more details.
You can place objects into classes. See ``3. Placing objects into classes'' for details.
You can tell pkgmk(C) where to find an object in your development directory structure and map that
name to the correct placement on the installation machine. See ``Mapping development pathnames to
installation pathnames'' for details.
You can define an object as relocatable. See ``Defining collectively relocatable objects'' and
``Defining individually relocatable objects'' for details.
You can define links. See ``9. Creating the prototype file'' for details.
The generic format of the descriptive line is:
[ part ] ftype class pathname [ major minor ] [ mode owner group ]

Definitions for each field are as follows:

part

prototype

200

Packaging your software applications


Designates the part in which an object should be placed. A package can be divided into a number of
parts. A part is a collection of files and is the atomic unit by which a package is processed. A
developer can choose the criteria for grouping files into a part (for example, by class). If not defined,
pkgmk(C) decides in which part the object will be placed.
ftype
Designates the file type of an object. Example file types are f (a standard executable or data file), d (a
directory), l (a linked file), and i (a package information file). (Refer to the prototype(F) manual page
for a complete list of file types.)
class
Defines the class to which an object belongs. All objects must belong to a class. If the object belongs
to no special class, this field should be defined as none.
pathname
Defines the pathname which an object should have on the installation machine. If you do not begin
this name with a slash, the object is considered to be relocatable. You can use the form path1=path2
to map the location of an object on your development machine to the pathname it should have when
installed on an installation machine.
When a package is stored on an installation medium with an s5 filesystem, each member of the
pathname is truncated to 14 characters. When the package is installed on the installation machine with
a filesystem that supports longer file names, such as sfs or ufs filesystems, the files are restored with
their original pathname length. However, on the s5 filesystem, the pathnames remain truncated.
major/minor
Defines the major and minor numbers for a block or character special device.
mode/owner/group
Defines the mode, owner, and group for an object. The mode, owner, and group must be defined or
packaging will fail. If not defined, the defaults defined with the default command are assigned.
Here is an example of this file with only description lines:
i
i
d
f
f
f

pkginfo
request
bin /ncmpbin 0755 root other
bin /ncmpbin/dired=/usr/ncmp/bin/dired 0755 root other
bin /ncmpbin/less=/usr/ncmp/bin/less 0755 root other
bin /ncmpbin/ttype=/usr/ncmp/bin/ttype 0755 root other

The command lines


The four types of commands that can be embedded in the prototype(F) file are:

search pathnames
Specifies a list of directories (separated by white space) in which pkgmk(C) should search when
looking for package objects. pathnames is prepended to the basename of each object in the prototype
file until the object is located.

prototype

201

Packaging your software applications


NOTE: The search command will not work when invoking pkgmk with the c option specified to
compress all noninformation package files.

include filename
Specifies the pathname of another prototype file that should be merged into this one during
processing. (Note that search requests do not span include files. Each prototype file should have its
own search command defined, if one is needed.)
default mode owner group
Defines the default mode owner group that should be used if this information is not supplied in a
prototype entry that requires the information. (The defaults do not apply to entries in any include
files. Each prototype file should have its own default command defined, if one is needed.)
param=value
Places the indicated parameter in the packaging environment. This allows you to expand a variable
pathname so that pkgmk can locate the object without changing the actual object pathname. (This
assignment will not be available in the installation environment.)
A command line must always begin with an exclamation point (``!''). Commands can have variable
substitutions embedded within them.
Here is an example prototype file with both description and command lines:
!PROJDIR=/usr/myname
!search /usr/myname/bin /usr/myname/src /usr/myname/hdrs
!include $PROJDIR/src/prototype
i pkginfo
i request
d bin ncmpbin 0755 root other
f bin ncmpbin/dired=/usr/ncmp/bin/dired 0755 root other
f bin ncmpbin/less=/usr/ncmp/bin/less 0755 root other
f bin ncmpbin/ttype=/usr/ncmp/bin/ttype 0755 root other
!default 755 root bin

compver
The compver(F) package information file defines previous (or future) versions of the package that are
compatible with this version. Each line in the file consists of a string defining a version of the package with
which the current version is compatible. Because some packages might require installation of a particular
version of another software package, compatibility information is extremely crucial. If a package ``A''
requires version ``1.0'' of application ``B'' as a prerequisite, but the customer installing ``A'' has a new and
improved version of ``1.3'' of ``B'', the compver(F) file for ``B'' must indicate that the new version is
compatible with version ``1.0'' in order for the customer to install package ``A''. The string must match the
definition of the VERSION parameter in the pkginfo(F) file of the package considered to be compatible. Here
is an example of this file:
Version 1.3
Version 1.0

compver

202

Packaging your software applications

copyright
The copyright(F) package information file contains the text of a copyright message that will be printed on the
terminal at the time of package installation or removal. The display is exactly as shown in the file. Here is an
example of this file.
Copyright (c) 2004 The SCO Group, Inc.
All Rights Reserved.

THIS PACKAGE CONTAINS UNPUBLISHED PROPRIETARY SOURCE CODE OF SCO.


The copyright notice above does not evidence any actual or intended publication of such source code.

depend
The depend(F) package information file defines software dependencies associated with the package. You can
define three types of package dependencies with this file:
a prerequisite package (this package depends on the existence of another package)
a reverse dependency (another package depends on the existence of this package)
an incompatible package (your package is incompatible with this one)
The generic format of a line in this file is:
type pkg name

Definitions for each field are as follows:

type
Defines the dependency type.
P indicates the named package is a prerequisite for installation.
I indicates the named package is incompatible.
R indicates a reverse dependency (the named package requires that this package be on the system).
This last type should only be used when a preUNIX System V Release 4 package (that cannot
deliver a depend file) relies on the newer package.
pkg
Indicates the package abbreviation for the package.
name
Specifies the full package name (used for display purposes only).
Here is an example of this file:
P acu

copyright

Advanced C Utilities

203

Packaging your software applications


Issue 4 Version 1
P cc C Programming Language
Issue 4 Version 1 (386)
R vpkg
Another Vendor Package

space
The space(F) package information file defines disk space requirements for the target environment beyond
that which is used by objects defined in the prototype(F) filefor example, files that will be dynamically
created at installation time. It should define the maximum amount of additional space that a package will
require.
The generic format of a line in this file is:
pathname blocks inodes
Definitions for each field are as follows:

pathname
Names a directory in which there are objects that will require additional space. The pathname can be
the mount point for a filesystem. Pathnames that do not begin with a slash (/) indicate relocatable
directories.
blocks
Defines the number of 512byte disk blocks required for installation of the files and directory entries
contained in the pathname. (Do not include filesystem dependent disk usage.)
inodes
Defines the number of inodes required for installation of the files and directory entries contained in
pathname.
Numbers of blocks or inodes can be negative to indicate that the package will ultimately (after processing by
scripts, and so on) take up less space than the installation tool would calculate.
Here is an example of this file:
# extra space required by config data which is
# dynamically loaded onto the system
data 500 1

pkgmap
The pkgmk(C) command creates the pkgmap(F) file when it processes the prototype file. This new file
contains all of the information in the prototype file plus three new fields for each entry. These fields are
``size'' (file size in bytes), ``cksum'' (checksum of file), and ``modtime'' (last time of modification). All
command lines defined in the prototype file are executed as pkgmk(C) creates the pkgmap(F) file. The
pkgmap file is placed on the installation medium. The prototype file is not. Refer to the pkgmap(F) manual
page for more details about this file.
space

204

Packaging your software applications

The installation scripts


The pkgadd(ADM) command automatically performs the actions necessary to install a package, using the
package information files as input. As a result, you do not have to supply any packaging scripts. However, if
you want to customize the installation procedures for your package needs, the following three types of scripts
can be used:

request script
Solicits administrator interaction during package installation for the purpose of assigning or
redefining environment parameter assignments.
class action scripts
Defines an action or set of actions that should be applied to a class of files during installation or
removal. You define your own classes or you can use one of three standard classes (sed, awk, and
build). See ``3. Placing objects into classes'' for details on how to define a class.
procedure scripts
Specifies a procedure to be invoked before or after the installation or removal of a package. The four
procedure scripts are preinstall, postinstall, preremove, and postremove.
You decide which type of script to use based on when you want the script to execute.
NOTE: All installation scripts must be executable by sh (for example, a shell script or an executable
program).

Script processing
You can customize the actions taken during installation by delivering installation scripts with your package.
The decision on which type of script to use depends upon when the action is needed during the installation
process. As a package is installed, pkgadd(ADM) performs the following steps:
1. Executes the request script.
This is the only point at which your package can solicit input from the installer.
2. Executes the preinstall script.
3. Installs the package objects.
Installation occurs classbyclass and class action scripts are executed accordingly. The list of classes
operated upon and the order in which they should be installed is initially defined with the CLASSES
parameter in your pkginfo(F) file. However, your request script can change the value of CLASSES.
NOTE: Be absolutely sure that the CLASSES environment variable in the pkginfo file lists all the
package's class names from the prototype file, or that they are conditionally added to the CLASSES
variable (as appropriate) by the request script. Only those class names appearing in the CLASSES
environment variable at the time installation begins will be installed. Any files belonging to a class
name not found in CLASSES at the time the installation begins will not be installed.

The installation scripts

205

Packaging your software applications


4. Executes the postinstall script.
When a package is being removed, pkgrm(ADM) performs these steps:
1. Executes the preremove script.
2. Executes the removal class action scripts.
Removal also occurs classbyclass. As with the installation class action scripts, if more than one
removal script exists, they are processed in the reverse order in which the classes were listed in the
CLASSES parameter at the time of installation.
3. Executes the postremove script.
The request script is not processed at the time of package removal. However, its output (a list of parameter
values) is saved and so is available to removal scripts.

Installation parameters
The following four groups of parameters are available to all installation scripts. Some of the parameters can
be modified by a request script, others cannot be modified at all.
The four system parameters that are generated by the installation software (see below for a description
of these). None of these parameters can be modified by a package.
The 21 standard installation parameters defined in the pkginfo(F) file. Of these, a package can only
modify the CLASSES parameter. The standard installation parameters are described in detail in the
pkginfo(F) manual page.
You can define your own installation parameters by assigning a value to them in the pkginfo(F) file.
Such a parameter must be alphanumeric with an initial capital letter. Any of these parameters can be
changed by a request script.
Your request script can define new parameters by assigning values to them and placing them into the
installation environment.
The four installation parameters that are generated by installation software are described below:

PATH
Specifies the search list used by sh to find commands; PATH is set to
/sbin:/usr/sbin:/usr/bin:/usr/sadm/install/bin upon script invocation.
UPDATE
Indicates that the current installation is intended to update the system. Automatically set to true if the
package being installed is overwriting a version of itself.
PKGINST
Specifies the instance identifier of the package being installed. The value is equal to the package
abbreviation (i.e., the same value as the PKG variable in the pkginfo file). See ``2. Defining a package
instance'' for more details.)
PKGSAV
Specifies the directory where files can be saved for use by removal scripts or where previously saved
files may be found.
Installation parameters

206

Packaging your software applications

Getting package information for a script


The two commands that can be used from your scripts to solicit information about a package are:

pkginfo(C)
This command returns information about software packages, such as the instance identifier and
package name.
pkparam(C)
This command returns values for all parameters or only for the parameters specified.
The pkginfo(C) and pkgparam(C) manual pages give details for these tools.

Exit codes for scripts


Each script must exit with one of the following exit codes:

0
Successful completion of script.
1
Fatal error. Installation process is terminated at this point.
2
Warning or possible error condition. Installation will continue. A warning message will be displayed
at the time of completion.
3
Script was interrupted and possibly left unfinished. Installation terminates at this point.
10
System should be rebooted when installation of all selected packages is completed. (This value should
be added to one of the singledigit exit codes described above.)
20
The system should be rebooted immediately upon completing installation of the current package.
(This value should be added to one of the singledigit exit codes described above.)
See the case studies for examples of exit codes in installation scripts.

The request script


The request script solicits interaction during installation and is the only place where your package can
interact directly with the installer. It can be used, for example, to ask the installer if optional pieces of a
package should be installed.
The output of a request script must be a list of parameters and their values. This list can include any of the
parameters you created in the pkginfo(F) file (not including the 21 standard parameters) and the CLASSES
Getting package information for a script

207

Packaging your software applications


parameter. The list can also introduce parameters that have not been defined elsewhere.
When your request script assigns values to a parameter, it must then make those values available to the
installation environment for use by pkgadd(ADM) and also by other packaging scripts. The following
example shows a request script segment that performs this task for the four parameters CLASSES,
NCMPBIN, EMACS, and NCMPMAN:
# make parameters available to installation service
# and any other packaging script we might have
cat >$1 <<!
CLASSES='$CLASSES'
NCMPBIN='$NCMPBIN'
EMACS='$EMACS'
NCMPMAN='$NCMPMAN'
!

Request script naming conventions


There can only be one request script per package and it must be named request.
Request script usage rules
The request script is executed as uid=root and gid=other.
The request script should not modify any files, with the exception of the ``response'' file (described
below) which is the output of the request script. It is intended only to interact with users and to create
a list of parameter assignments based upon that interaction.
pkgadd(ADM) calls the request script with one argument that names the file to which the output of
this script will be written. This file is referred to as the response file.
The parameter assignments should be added to the installation environment for use by pkgadd(ADM)
and other packaging scripts (as shown in the previous example.
System parameters and standard installation parameters, except for the CLASSES parameter, cannot
be modified by a request script. Any of the other parameters available can be changed.
The format of the output list should be PARAMETER="value". For example:
CLASSES="none class1"
The list should be written to the file named as the argument to the request script.
The user's terminal is defined as standard input to the request script.
The request script is not executed during package removal. However, the parameter values assigned
in the script are saved and are available during removal.

Soliciting user input in request scripts


A tool is provided in the base package for generating fullscreen menus for handling user input. This tool is
called menu, and should be used when user input is solicited in a request script. Using a form description file
(see menu(F)), menu generates a fullscreen form that can be used for displaying information, entering a
selection from a numbered list, or filling out a more complex, multiplefield form. The menu tool can also
contain help text so that a user can get more information about the current step in the installation.
The menu tool output is a file that contains Bourne/Korn shell statements of the form:

The request script

208

Packaging your software applications


VARIABLE="value"
After the user completes the menu, this output file can be read in and executed in the request script using the
shell '.' command. The values obtained can be used to generate the response file, the output of the request
script.
The menu tool should be used when soliciting user input from the request script. The menu tool can also be
used in the postinstall script for a package to inform the user about the status of the installation at completion.
(See the manual pages for menu(C) and menu(F) for more information.)

The class action script


The class action script defines a set of actions to be executed during installation or removal of a package. The
actions are performed on a group of pathnames based on their class definition. (See the case studies for
examples of class action scripts.)
Class action script naming conventions
The name of a class action script is based on which class it should operate and whether those actions should
occur during package installation or removal. The two name formats are:

i.class
operates on pathnames in the indicated class during package installation
r.class
operates on pathnames in the indicated class during package removal
For example, the name of the installation script for a class named class1 would be i.class1 and the removal
script would be named r.class1.
Class action script usage rules
1. Class action scripts are executed as uid=root and gid=other.
2. If a package spans more than one volume, the class action script will be executed once for each
volume that contains at least one file belonging to the class. Consequently, each script must be
``multiply executable.'' This means that executing a script any number of times with the same input
must produce the same results as executing the script only once.
NOTE: The installation service relies upon this condition being met.
3. The script is executed only if there are files in the given class existing on the current volume.
4. pkgadd(ADM) (and pkgrm(ADM)) creates a list of all objects listed in the pkgmap(F) file that
belong to the class. As a result, a class action script can only act upon pathnames defined in the
pkgmap(F) and belonging to a particular class.
5. A class action script should never add, remove, or modify a pathname or system attribute that does
not appear in the list generated by pkgadd(ADM) unless by use of the installf(ADM) or
removef(ADM) commands.

The class action script

209

Packaging your software applications


(See the manual pages for details on these two commands and the case studies for examples of them
in use.)
6. When the class action script executes for the last time (the input pathname is the last path on the last
volume containing a file of this class), it is executed with the keyword argument ENDOFCLASS.
This flag allows you to include postprocessing actions into your script.

Installation of classes
The following steps outline the system actions that occur when a class is installed. The actions are repeated
once for each volume of a package as that volume is being installed.
1. pkgadd(ADM) creates a list of pathnames upon which the action script will operate. Each line of this
list consists of source and destination pathnames, separated by white space. The source pathname
indicates where the object to be installed resides on the installation volume and the destination
pathname indicates the location on the installation machine where the object should be installed. The
contents of the list is restricted by the following criteria:
The list contains only pathnames belonging to the associated class.
Directories, named pipes, character/block devices, and symbolic links are included in the list
with the source pathname set to /dev/null. They are automatically created by pkgadd(ADM)
(if not already in existence) and given proper attributes (mode, owner, and group) as defined
in the pkgmap(F) file.
Linked files are not included in the list, that is, files where ftype is l. (ftype defines the file
type defined in the prototype file.) Links in the given class are created in Step 4.
If a pathname already exists on the target machine and its contents are no different from the
one being installed, the pathname will not be included in the list.
To determine this, pkgadd(ADM) compares the cksum, modtime, and size fields in the
installation software database with the values for those fields in your pkgmap(F) file. If they
are the same, it then checks the actual file on the installation machine to be certain it really
has those values. If the field values are the same and are correct, the pathname for this object
will not be included in the list.
2. If there is no class action script, the files associated with the pathnames are copied to the target
machine.
If no class action script is provided for installation of a particular class, the files in the generated
pathname list will simply be copied from the volume to the appropriate target location.
3. If there is a class action script, the script is executed.
The class action script is invoked with standard input containing the list generated in Step 1. If this is
the last volume of the package and there are no more objects in this class, the script is executed with
the single argument of ENDOFCLASS.
4. pkgadd(ADM) performs a content and attribute audit and creates links.
After successfully executing Step 2 or 3, an audit of both content and attribute information is
performed on the list of pathnames. pkgadd(ADM) creates the links associated with the class
automatically. Detected attribute inconsistencies are corrected for all pathnames in the generated list.

The class action script

210

Packaging your software applications


Removal of classes
Objects are removed classbyclass. Classes that exist for a package, but are not listed in the CLASSES
parameter are removed last (for example, an object installed with the installf(ADM) command). Classes that
are listed in the CLASSES parameter are removed in reverse order. The following steps outline the system
actions that occur when a class is removed:

1. pkgrm(ADM) creates a list of installed pathnames that belong to the indicated class. Pathnames
referenced by another package are excluded from the list unless their ftype is e (the file can be edited
upon installation or removal).
If a pathname is referenced by another package, it will not be removed from the system. However, if
it is of ftype e, it can be modified to remove information placed in it by the package being removed.
The modification should be performed by the removal class action script.
2. If there is no class action script, the pathnames are removed.
If your package has no removal class action script for the class, all of the pathnames in the list
generated by pkgrm(ADM) will be removed.
NOTE: Always assign a class for files with an ftype of e (editable) and have an associated class
action script for that class. Otherwise, they will be removed at this point, even if the pathname is
shared with other packages.
3. If there is a class action script, the script is executed.
pkgrm(ADM) invokes the class action script with standard input containing the list generated in Step
1.
4. pkgrm(ADM) performs an audit.
Upon successful execution of the class action script, knowledge of the pathnames is removed from the
system unless a pathname is referenced by another package.

The special system classes


The system provides three special classes:
sed class provides a method for using sed(C) instructions to edit files upon installation and removal
awk class provides a method for using awk(C) instructions to edit files upon installation and removal
build class provides a method to construct a file dynamically during installation

The sed class script


The sed installation class provides a method of installing and removing objects that require modification to
an existing object on the target machine. (The file must have already been installed by another package.) A
sed class action script delivers sed instructions in the format shown in the next example. You can give
instructions that will be executed during either installation or removal. Two commands indicate when
instructions should be executed. sed instructions that follow the !install command are executed during
package installation and those that follow the !remove command are executed during package removal. It
The class action script

211

Packaging your software applications


does not matter in which order the commands are used in the file.
The sed class action script executes automatically at installation time if a file belonging to class sed exists.
The name of the sed class file should be the same as the name of the file upon which the instructions will be
executed.
# comment, which may appear on any line in the file
!install
# sed(C) instructions which are to be invoked during
# installation of the object
[address [,address]] function [arguments]
. . .

!remove # sed(C) instructions to be invoked during the removal process [address [,address]] function
[arguments] . . .
address, function, and arguments are as defined in the sed(C) manual page. See case study 5a and case study
5b for examples of sed class action scripts.
The awk class script
The awk installation class provides a method of installing and removing objects that require modification to
an existing object on the target machine (the object must have been previously installed from another package
installation). Modifications are delivered as awk instructions in an awk class action script.
The awk class action script executes automatically at the time of installation if a file belonging to class awk
exists. Such a file contains instructions for the awk class script in the format shown in the following example.
Two commands indicate when instructions should be executed. awk instructions that follow the !install
command are executed during package installation and those that follow the !remove command are executed
during package removal. It does not matter in which order the commands are used in the file.
The name of the awk class file should be the same as the name of the file upon which the instructions will be
executed.
# comment, which may appear on any line in the file
!install
# awk(C) program to install changes
. . . (awk program)

!remove # awk1(C) program to remove changes . . . (awk program)


The file to be modified is used as input to awk and the output of the script ultimately replaces the original
object. Parameters cannot be passed to awk using this syntax.
See case study 5a for example awk class action script examples.
The build class script
The build class installs or removes objects by executing instructions that create or modify the object file.
These instructions are delivered as a build class action script.

The special system classes

212

Packaging your software applications


The name of the instruction file should be the same as the name of the file upon which the instructions will be
executed.
The build class action script executes automatically at installation time if a file belonging to class build
exists.
A build script must be executable by sh(C). The script's output becomes the new version of the file as it is
built.
See case study 5c for an example build class action script.

The procedure script


The procedure script gives a set of instructions that are performed at particular points in installation or
removal. Four possible procedure scripts are described below. The case studies show examples of procedure
scripts.
Naming conventions for procedure scripts
The four procedure scripts must use one of the names listed below, depending on when these instructions are
to be executed.

preinstall
executes before class installation begins
postinstall
executes after all volumes have been installed
preremove
executes before class removal begins
postremove
executes after all classes have been removed

Procedure script usage rules


1. Procedure scripts are executed as uid=root and gid=other.
2. Each installation procedure script must use the installf(ADM) command to notify pkgadd(ADM)
that it will add or modify a pathname. After all additions or modifications are complete, this command
should be invoked with the f option to indicate all additions and modifications are complete. (See the
installf(ADM) manual page and the case studies for details and examples.)
3. Each removal procedure script must use the removef(ADM) command to notify pkgrm(ADM) that
it will remove a pathname. After removal is complete, this command should be invoked with the f
option to indicate all removals have been completed. (See the removef(ADM) manual page and the
case studies for details and examples.)
NOTE: The installf(ADM) and removef(ADM) commands must be used for the following reasons. If a
procedure script physically removes an object from the system, the system's contents database still contains an
The procedure script

213

Packaging your software applications


entry for that object until the removef(ADM) command is used to remove the entry. Similarly, if a procedure
script places an object on the system, it will not be registered in the contents database until the installf(ADM)
command is used to register the object.

Basic steps of packaging


What steps you take to create a package depend on how customized your package will be; therefore, it is
difficult to give you a stepbystep guide on how to proceed. Your first step should be to plan your
packaging. For example, you must decide on which package information files and scripts your package needs.
The following list outlines some of the steps you might use in a packaging scenario. Not all of these steps are
required and there exists no mandated order for their execution (although you must have all of your package
objects together before executing pkgmk(C)). The remainder of this topic gives procedural information for
each step.
NOTE: This list, and the following procedures, are intended only as guidelines. These guidelines cannot
substitute for reading the rest of this topic to learn what options are available to your package, and do your
own individualized planning.

1. Assign a package abbreviation.


Every package installed in the SCO OpenServer environment must have a package abbreviation.
2. Define a package instance.
You must decide on values for the three package parameters that will make each package instance
unique.
3. Place your objects into classes.
You must decide on what installation classes you are going to use before you can create the
prototype(F) file and also before you can write your class action scripts.
4. Set up a package and its objects as relocatable.
Package objects can be delivered with either fixed locations, meaning that their location is defined by
the package and cannot be changed, or with relocatable locations, meaning that they have no absolute
location requirements. All of a package or parts of a package can be defined as relocatable. Decide if
package objects will have fixed locations or be relocatable before you write any installation scripts
and before you create the prototype file.
5. Decide which installation scripts your package needs.
You must assess the needs of your package beyond the actions provided by pkgadd(ADM) and
decide on which type of installation scripts allow you to deliver your customized actions.
6. Define package dependencies.
You must decide if your package has dependencies on other packages and if any other packages
depend on yours.
7. Write a copyright message.

Basic steps of packaging

214

Packaging your software applications


You must decide if your package requires a copyright message to appear as it is being installed (and
removed) and, if so, you must write that message.
8. Create the pkginfo file.
You must create a pkginfo(F) file before executing pkgmk(C). It defines basic information
concerning the package and can be created with any editor as long as it follows the format described
earlier. The format is also described on the pkginfo(F) manual page.
9. Create the prototype file.
This file is required and must be created before you execute pkgmk(C). It lists all of the objects that
belong to a package and information about each object (such as its file type and to which class it
belongs). You can create it using any editor and you must follow the format described earlier. The
format is also described on the prototype(F) manual page. You can also use the pkgproto(C)
command to generate a prototype file.
10. Distribute packages over multiple volumes.
pkgmk(C) automatically distributes packages over multiple volumes. You must decide if you want to
leave those calculations up to pkgmk(C) or customize package placement on multiple volumes.
11. Create the package with pkgmk.
Create the package using the pkgmk(C) command, which copies objects from the development
machine to the installation medium, puts them into the proper structure, and automatically spans them
across multiple volumes, if necessary.
12. Create the package with pkgtrans.
If you want to create a datastream structure for your package, you must execute pkgtrans(C) after
creating a package with pkgmk(C). This step is optional.

1. Assigning a package abbreviation


Each package installed on SCO OpenServer must have a package abbreviation assigned to it. This
abbreviation is defined with the PKG parameter in the pkginfo(F) file.
A valid package abbreviation must meet these criteria:
It must start with an alphabetic character.
Additional characters can be alphanumeric and contain the two special characters ``+'' and .
It cannot be longer than nine characters.
Reserved names are install, new, and all.

2. Defining a package instance


The same software package can differ by version, architecture, or both. Each variation is known as a
``package instance''. pkgadd(ADM)
The only way to install another instance of the same package is to install the new version on top of the old
version. The new instance overwrites any files installed by the old instance, installs new files, and may (if the
packaging scripts are written to do so) remove files from the system. A pkginfo l of the package name
displays the information from the new package instances pkginfo file. Removing the package removes all files
1. Assigning a package abbreviation

215

Packaging your software applications


that had been installed by any instance of the package added to the system.
Identifying a package instance
Three parameters defined in the pkginfo(F) file combine to identify each instance uniquely. You should not
assign identical values for all three parameters for two instances of the same package installed in the same
target environment. These parameters are:

PKG
defines the software package abbreviation and remains constant for every instance of a package
VERSION
defines the software package version
ARCH
defines the software package architecture
For example, you might identify two identical versions of a package that run on different hardware as:
Instance 1
PKG="abbr"
VERSION="release 1"
ARCH="MX300I"

Instance 2
PKG="abbr"
VERSION="release
1"
ARCH="i386"

Two different versions of a package that run on the same hardware might be identified as:
Instance 1
PKG="abbr"
VERSION="release 1"
ARCH="i386"

Instance 2
PKG="abbr"
VERSION="release
2"
ARCH="i386"

All instances of a package installed on a system use the package abbreviation as the instance identifier
(PKGINST).
NOTE: pkgmk(C) also assigns an instance identifier to a package as it places it on the installation medium if
one or more instances of a package already exists. That identifier bears no relationship to the identifier
assigned to the same package on the installation machine.

Accessing the instance identifier in your scripts


Use the PKGINST system parameter to reference your package in your installation scripts.

2. Defining a package instance

216

Packaging your software applications

3. Placing objects into classes


Installation classes allow a series of actions to be performed on a group of package objects at the time of their
installation or removal. You place objects into a class in the prototype(F) file. All package objects must be
given a class, although the class of none can be used for objects that require no special action.
The installation parameter CLASSES, defined in the pkginfo(F) file, is a list of classes to be installed
(including the none class). Objects defined in the prototype file that belong to a class not listed in this
parameter will not be installed. The actions to be performed on a class (other than simply copying the
components to the installation machine) are defined in a class action script. These scripts are named after the
class itself.
For example, to define and install a group of objects belonging to a class named class1, follow these steps:
1. Define the objects belonging to class1 as such in their prototype(F) file entry. For example:
f class1 /usr/src/myfile
f class1 /usr/src/myfile2

2. Ensure that the CLASSES parameter in the pkginfo(F) file has an entry for class1. For example:
CLASSES="class1 class2 none"

NOTE: Package objects cannot be removed by class.


3. Ensure that a class action script exists for this class. An installation script for a class named class1
would be named i.class1 and a removal script would be named r.class1.
If you define a class but do not deliver a class action script, the only action taken for that class will be
to copy components from the installation medium to the installation machine.
In addition to the classes that you can define, the system provides three standard classes for your use. The sed
class provides a method for using sed instructions to edit files upon package installation and removal. The
awk class provides a method for using awk instructions to edit files upon package installation and removal.
The build class provides a method to construct a file dynamically during package installation.

4. Making package objects relocatable


Package objects can be delivered either with fixed locations (their location on the installation machine is
defined by the package and cannot be changed) or as relocatable (they have no absolute location requirements
on the installation machine). The location for relocatable package objects is determined during the installation
process.
You can define two types of relocatable objects: collectively relocatable and individually relocatable. All
collectively relocatable objects are placed relative to the same directory once the relocatable root directory is
established. Individually relocatable objects are not restricted to the same directory location as collectively
relocatable objects.

3. Placing objects into classes

217

Packaging your software applications


Defining collectively relocatable objects
Follow these steps to define package objects as collectively relocatable:
1. Define a value for the BASEDIR parameter.
Put a definition for the BASEDIR parameter in your pkginfo(F) file. This parameter names a
directory where relocatable objects will be placed by default. If you supply no value for BASEDIR,
no package objects are considered as collectively relocatable.
2. Define objects as collectively relocatable in the prototype(F) file.
An object is defined as collectively relocatable by using a relative pathname in its entry in the
prototype file. A relative pathname does not begin with a slash. For example, src/myfile is a relative
pathname, while /src/myfile is a fixed pathname.
NOTE: A package can deliver some objects with relocatable locations and others with fixed
locations.

All objects defined as collectively relocatable are put under the same root directory on the installation
machine. The root directory value will be one of the following (and in this order):
if the admin file contains basedir=ask and pkgadd(ADM) was not invoked in noninteractive mode,
then the installer's response to pkgadd(ADM) when asked where relocatable objects should be
installed (this overrides the value for BASEDIR in the package's pkginfo(F) file, if any)
the value of BASEDIR as it is defined in the admin file used during the pkgadd(ADM) process (the
BASEDIR value assigned in the admin file overrides the value of the pkginfo(F) file)
the value of BASEDIR as it is defined in your pkginfo(F) file (this value is used only as a default in
case the other two possibilities have not supplied a value)
if the admin file contains basedir=default and no BASEDIR is set in the package's pkginfo(F) file,
then BASEDIR defaults to /

Defining individually relocatable objects


A package object is defined as individually relocatable by using a variable in its pathname definition in the
prototype(F) file. Your request script must query the installer on where such an object should be placed and
assign the response value to the variable. pkgadd(ADM) expands the pathname based on the output of your
request script at the time of installation. Case study 1 shows an example of the use of variable pathnames and
the request script needed to solicit a value for the base directory.

5. Writing your installation scripts


Read ``The installation scripts'' to learn what types of scripts you can write and how to write them. You can
also look at the case studies to see how the various scripts can be used and to see examples.
Reserving additional space on the installation machine
pkgadd(ADM) assures that there is enough disk space to install your package, based on the object definitions
in the pkgmap(F) file. However, sometimes your package will require additional disk space beyond that
4. Making package objects relocatable

218

Packaging your software applications


needed by the objects defined in the pkgmap file. For example, your package might create a file during
installation. pkgadd(ADM) checks for additional space when you deliver a space(F) file with your package.
Refer to ``space'' or the space(F) manual page for details on the format of this file.
NOTE: Be certain that your space file has an entry in the prototype(F) file. Its file type should be i (for
package information file).

6. Defining package dependencies


Package dependencies and incompatibilities can be defined with two of the optional package information
files. Delivering a compver(F) file lets you name versions of your package that are compatible with the one
being installed. Delivering a depend(F) file lets you define three types of dependencies associated with your
package. These dependency types are:

a prerequisite package
your package depends on the existence of another package
a reverse dependency
another package depends on the existence of your package
NOTE: This type should only be used when a preUNIX System V Release 4 package (that cannot
deliver a depend(F) file) relies on the newer package.

an incompatible package
your package is incompatible with this one
Refer to ``depend'' and ``compver'', or the manual pages depend(F) and compver(F) for details on the formats
of these files.
NOTE: Be certain that your depend and compver files have entries in the prototype(F) file. The file type
should be i (for package information file).

7. Writing a copyright message


To deliver a copyright message, you must create a copyright file named copyright. The message will be
displayed exactly as it appears in the file (no formatting) as the package is being installed and as it is being
removed. Refer to ``copyright'' or the copyright(F) manual page for more detail.
NOTE: Be certain that your copyright file has an entry in the prototype file. Its file type should be i (for
package information file).

6. Defining package dependencies

219

Packaging your software applications

8. Creating the pkginfo file


The pkginfo(F) file establishes values for parameters that describe the package and is a required package
component. The format for an entry in this file is:
PARAM="value"

PARAM can be any of the 21 standard parameters described in the pkginfo(F) manual page. You can also
create your own package parameters simply by assigning a value to them in this file. Your parameter names
must begin with an uppercase letter followed by either upper or lowercase letters.
The following five parameters are required:
PKG (package abbreviation)
NAME (full package name)
ARCH (package architecture)
VERSION (package version)
CATEGORY (package category)
The CLASSES parameter dictates which classes are installed and the order of installation. Although the
parameter is not required, no classes will be installed without it. Even if you have no class action scripts, the
none class must be defined in the CLASSES parameter before objects belonging to that class will be
installed.
NOTE: You can choose to define the value of CLASSES with a request script and not to deliver a value in
the pkginfo(F) file.

9. Creating the prototype file


The prototype file is a list of package contents and is a required package component.
You can create the prototype file by using any editor and following the format described in the section
``prototype'' and in the prototype(F) manual page. You can also use the pkgproto(C) command to create one
automatically.
Creating the file manually
While creating the prototype file, you must at the very least supply the following three pieces of information
about an object:
The object's type
All of the possible object types are defined in the prototype(F) manual page. f (for a data file), l (for a
linked file), and d (for a directory) are examples of object types.
The object's class
All objects must be assigned a class. If no special handling is required, you can assign the class none.
The object`s pathname

8. Creating the pkginfo file

220

Packaging your software applications


The pathname can define a fixed pathname such as /mypkg/src/filename, a collectively relocatable
pathname such as src/filename, and an individually relocatable pathname such as $BIN/filename or
/opt/$PKGINST/filename.

Creating links

To define links, you must do the following in the prototype entry for the linked object:
1. Define its ftype as l (a link) or s (a symbolic link).
2. Define its pathname with the format path1=path2 where path1 is the destination and path2 is the
source file.

Mapping development pathnames to installation pathnames

If your development area is in a different structure than you want the package to be in on the installation
machine, use the prototype entry to map one pathname to the other. You use the path1=path2 format for the
pathname as is used to define links. However, because the ftype is not defined as l or s, path1 is interpreted as
the pathname you want the object to have on the installation machine, and path2 is interpreted as the
pathname the object has on your development machine.
For example, your project might require a development structure that includes a project root directory and
numerous src directories. However, on the installation machine you might want all files to go under a package
root directory and for all src files to be in one directory. So, a file on your machine might be named
/projdir/srcA/filename. If you want that file to be named /pkgroot/src/filename on the installation machine,
your prototype entry for this file might look like this:
f class1 /pkgroot/src/filename=/projdir/srcA/filename

Defining objects for pkgadd to create

You can use the prototype(F) file to define objects that are not actually delivered on the installation medium.
pkgadd(ADM) creates objects with the following ftypes if they do not already exist at the time of installation:
d (directories)
x (exclusive directories)
l (linked files)
s (symbolically linked files)
p (named pipes)
c (character special device)
b (block special device)
To request that one of these objects be created on the installation machine, add an entry for it in the prototype
file using the appropriate ftype.
For example, if you want a directory created on the installation machine, but do not want to deliver it on the
installation medium, an entry for the directory in the prototype file is sufficient. An entry such as the one
shown below causes the directory to be created on the installation machine, even if it does not exist on the
installation medium.
9. Creating the prototype file

221

Packaging your software applications


d none /directoryA 644 root other

Using the command lines

The four types of commands that you can put into your prototype(F) file allow you to:
Nest prototype files (the include command).
Define directories for pkgmk(C) to look in when attempting to locate objects as it creates the package
(the search command).
NOTE: This will not work if pkgmk is instructed to compress the package.
Set a default value for mode owner group (the default command). If all or most of your objects have
the same values, using the default command keeps you from having to define these values for every
entry in the prototype file.
Assign a temporary value for variable pathnames to tell pkgmk where to locate these relocatable
objects on your machine (with param="value").

Creating the file using pkgproto


The pkgproto(C) command scans your directories and generates a prototype(F) file. pkgproto cannot assign
ftypes of v (volatile files), e (editable files), or x (exclusive directories). You can edit the prototype file and
add these ftypes, as well as perform any other finetuning you require (for example, adding command lines or
classes).
pkgproto(C) writes its output to the standard output. To create a file, redirect the output to a file. The
examples shown in this section do not perform redirection to show you what the contents of the file would
like.
Creating a basic prototype

The standard format of pkgproto(C) is:


pkgproto path [ . . . ]

where path is the name of one or more paths to be included in the prototype(F) file. If path is a directory,
entries are created for the contents of that directory as well (everything below that directory).
With this form of the command, all objects are placed into the none class and are assigned the same mode
owner group as exists on your machine. The following example shows pkgproto being executed to create a
file for all objects in the directory /home/pkg:
$
d
f
f
f
f
f
$

pkgproto /home/pkg
none /home/pkg 755 bin bin
none /home/pkg/file1 755 bin
none /home/pkg/file2 755 bin
none /home/pkg/file3 755 bin
none /home/pkg/file4 755 bin
none /home/pkg/file5 755 bin

9. Creating the prototype file

bin
bin
bin
bin
bin

222

Packaging your software applications


To create a prototype file that contains the output of the example above, you would execute pkgproto
/home/pkg > prototype.
NOTE: If no pathnames are supplied when executing pkgproto standard input (stdin) is assumed to be a list
of paths. Refer to the pkgproto(C) manual page for details on this usage.

Assigning objects to a class

You can use the c class option of pkgproto(C) to assign objects to a class other than none. When using this
option, you can only name one class. To define multiple classes in a prototype(F) file created by pkgproto,
you must edit the file after its creation.
The following example is the same as above except the objects have been assigned to class1:
$
d
f
f
f
f
f
$

pkgproto c class1 /home/pkg


class1 /home/pkg 755 bin bin
class1 /home/pkg/file1 755 bin
class1 /home/pkg/file2 755 bin
class1 /home/pkg/file3 755 bin
class1 /home/pkg/file4 755 bin
class1 /home/pkg/file5 755 bin

bin
bin
bin
bin
bin

Renaming pathnames with pkgproto

Use a path1=path2 format on the pkgproto(C) command line to give an object a different pathname in the
prototype(F) file than it has on your machine. You can, for example, use this format to define relocatable
objects in a prototype file created by pkgproto.
The following example is like the others shown in this section, except that the objects are now defined as bin
(instead of /usr/bin) and are thus relocatable:
$
d
f
f
f
f
f
$

pkgproto c class1 /home/pkg=bin


class1 bin 755 bin bin
class1 bin/file1 755 bin bin
class1 bin/file2 755 bin bin
class1 bin/file3 755 bin bin
class1 bin/file4 755 bin bin
class1 bin/file5 755 bin bin

pkgproto and links

pkgproto(C) detects linked files and creates entries for them in the prototype(F) file. If multiple files are
linked together, it considers the first path encountered the source of the link.
If you have symbolic links established on your machine, but want to generate an entry for that file with an
ftype of f (file), use the i option of pkgproto(C). This option creates a file entry for all symbolic links.

9. Creating the prototype file

223

Packaging your software applications

10. Distributing packages over multiple volumes


As a packager, you need not worry about placing package components on multiple volumes. pkgmk(C)
performs the calculations and actions necessary to organize a multiple volume package.
However, you can use the optional part field in the prototype(F) file to define in which part you want an
object to be placed. A number in this field overrides pkgmk and forces the placement of the component into
the part given in the field. There is a onetoone correspondence between parts and volumes for removable
media.

11. Creating a package with pkgmk


pkgmk(C) takes all of the objects on your machine (as defined in the prototype(F) file), puts them in the
fixed directory format and copies everything to the installation medium.
To package your software, execute:
pkgmk [d device] [f filename]

You must use the d option to name the device onto which the package should be placed. device can be a
directory pathname or the identifier for a disk. The default device is the installation spool directory.
pkgmk looks for a file named prototype. You can use the f option to specify a package contents file named
something other than prototype. This file must be in the prototype format.
For example, executing pkgmk d diskette1 creates a package based on a file named prototype in your
current directory. The package is formatted and copied to the diskette in the device diskette1.
Package file compression
In Release 4.2, the pkgmk(C) command has been enhanced to optionally compress package files. If the c
option is specified, pkgmk(C) will compress all noninformation files. The following exceptions apply:
If, as a result of compression, the size of the file is not reduced, pkgmk(C) will not compress the file.
If the pathname for the file in the package's prototype(F) file is a relative pathname, (for example,
../mypkg/foo), the file will not be compressed.

Creating a package instance


pkgmk(C) creates a new instance of a package if one already exists on the device to which it is writing. It
assigns the package an instance identifier. Use the o option of pkgmk to overwrite an existing instance of a
package rather than to create a new one.
NOTE: This use of ``instance'' should not be confused with the instance identifier assigned on installation and
referenced in packaging scripts with the PKGINST variable. The pkgmk command simply writes the new
package to another directory name, adding characters to the end of the package abbreviation to distinguish the
new package from any instances of the same package it finds already on the target media.

10. Distributing packages over multiple volumes

224

Packaging your software applications


Helping pkgmk locate package contents
The following list describes situations that might require supplying pkgmk(C) with extra information and an
explanation of how to do so:
Your development area is not structured in the same way that you want your package structured.
Use the path1=path2 pathname format in your prototype(F) file.
You have relocatable objects in your package.
Use the path1=path2 pathname format in your prototype(F) file, with path1 as a relocatable name and
path2 a full pathname to that object on your machine.
You can use the search command in your prototype(F) file to tell pkgmk(C) where to look for
objects. (You cannot use the c option with search, however.)
You can use the b basedir option to define a pathname that informs pkgmk(C) where to find
relocatable object names while creating the package. It does this by prepending basedir to relocatable
object names while creating the package. For example, executing the following command looks in the
directory /usr2/myhome/reloc for any relocatable object in your package:
pkgmk d /dev/diskette b usr2/myhome/reloc

You have variable object names.


Use the search command in your prototype file to tell pkgmk where to look for objects. (You cannot
use the c option with search, however.)
You can use the param="value" command in your prototype file to give pkgmk a value to use for the
object name variables as it creates your package.
Use the variable=value option on the pkgmk command line to define a temporary value for variable
names.
The root directory on your machine differs from the root directory described in the prototype file (and
that will be used on the installation machine).
Use the r rootpath option to tell pkgmk to ignore the destination pathnames in the prototype file.
Instead, pkgmk prepends rootpath to the source pathnames in order to find objects on your machine.

12. Creating a package with pkgtrans


pkgtrans(C) performs the following package translations:
a fixed directory structure to a datastream
a datastream to a fixed directory structure
To perform one of these translations, execute:
pkgtrans s device1 device2 [pkg1[,pkg2[ . . . ]]]

11. Creating a package with pkgmk

225

Packaging your software applications


where s is the option to translate to datastream, device1 is the name of the device or directory where the
package currently resides, device2 is the name of the device onto which the translated package will be placed,
and [pkg1[pkg2 . . . ]] is one or more package names. If no package names are given, a menu of all packages
residing in device1 is displayed and the user asked for a selection.
Creating a datastream package
Creating a datastream package requires two steps:
1. Create a package using pkgmk(C).
Use the default device (the installation spool directory) or name a directory into which the package
should be placed. pkgmk(C) creates a package in a fixed directory format. Specify the capacity of the
device where the datastream will be placed as an argument to the l option.
2. After the software is formatted in fixed directory format and is residing in a spool directory, execute
pkgtrans(C).
This command translates the fixed directory format to the datastream format and places the
datastream on the specified medium.
For example, the two steps shown below create a datastream package:
1. pkgmk d spooldir l 1400
This formats a package into a fixed directory structure and places it in a directory named spooldir.
Each part of the package requires no more than 1400 blocks.
2. pkgtrans s spooldir 9track package1
This translates the fixed directory format of package1 residing in the directory spooldir into a
datastream format, and places the datastream package on the medium in a device named 9track.
OR
3. pkgtrans s spooldir diskette package1
This is similar to number 2 above, except that it places the datastream package on the medium in a
device named diskette. pkgtrans(C) prompts for additional volumes if the package requires more
than one diskette.

Translating a package instance


When an instance of the package being translated to fixed directory format already exists on device2,
pkgtrans(C) does not perform the translation. You can use the o option to tell pkgtrans to overwrite any
existing instances on the destination device and the n option to tell it to create a new instance if one already
exists. Note that the above does not apply when device2 contains a datastream format.

Set packaging
Sets provide a method of grouping packages together as one installable entity. Usually this is used to group
packages that provide a particular feature or set of features. To enable the set capability in SCO OpenServer,
12. Creating a package with pkgtrans

226

Packaging your software applications


these packaging commands have been enhanced:
pkgadd(ADM)
pkginfo(C)
pkgrm(ADM)

Set installation
For sets, a specialpurpose package referred to as a Set Installation Package (SIP) is used. The SIP is used to
control the installation of a set's member packages. The SIP's name and package instance name are always the
same as those used to identify the set itself. For instance, the SIP controlling the installation of the Foundation
Set (fnd) is also named Foundation Set (fnd). A SIP is distinguished from other packages by the
CATEGORY parameter "set" in its pkginfo(F) file and by the presence of a special type of package
information file named setinfo(F). This file is used to convey information about a set's member packages to
the software installation tools.
When pkgadd(ADM) recognizes that a SIP is being processed, it sets up special environment variables and
makes them available to the SIP's procedure scripts. This allows for a welldefined interface between the
scripts and pkgadd(ADM) that enables the SIP scripts to do most of the work when processing set member
package selection and interaction. The SIP's request and preinstall scripts are especially designed to use this
environment.
Among other things, the SIP's request script uses these environment variables to access the setinfo(F) file and
access the set member packages' request and default response files (if any). After the request script has
finished processing, the SIP's preinstall script is then used to pass back to pkgadd(ADM) a list of set member
packages selected for installation as part of the set (see case study 7 for examples of these scripts).
The following is a list of the environment variables made available to a SIP's procedure scripts.

$SETINFO
Used to access the setinfo(F) file.
$REQDIR
Provides the directory where the set member packages' request and default response files, if any,
reside.
$RESPDIR
Contains the name of the directory where processed response files are to be placed. This response file
could be the result of having run a set member package's request script (in the case of custom
installation) or simply a copy of the default response file provided with the SIP (in the case of
automatic installation).
$SETLIST
Used to pass back to pkgadd(ADM) the list of packages selected for installation as part of the set.
After it has processed a SIP, pkgadd adds the set member packages selected (it gets this from the file
represented by $SETLIST in the installation environment) to the list of packages to be installed and proceeds
to install them.

Set installation

227

Packaging your software applications

Set removal
When the package instance specified to the pkgrm(ADM) command is a SIP, pkgrm(ADM) will remove all
of the SIP's set member packages in reverse dependency order (opposite the order in which they were
installed). After all of its member packages have been removed, the SIP itself is removed from the system.

Set information display


For SCO OpenServer, the pkginfo(C) command has been enhanced to display information about sets.
Because the name of the set is the same as its SIP, pkginfo must distinguish between a request to provide
information on the SIP and a request asking for information on the set's member packages (not including the
SIP).
If c set is specified, pkginfo displays information about the SIP, if one was specified on the command line,
or on all SIPs if none was specified. If the category set is not specified, pkginfo displays information about all
packages except those whose category is set. If the name of a set is specified on the command line, but c set
is not, pkginfo displays information on all set member packages belonging to that set except for the SIP itself.

The setsize file


The setsize(F) file is a set information file that defines disk space requirements for the target environment. It
contains information about all of the packages in the set. This file describes the disk space taken up by
installed files as well as extra space needed for dynamically created files, as described in each package's
space(F) file.
The generic format of a line in this file is:
pkg:pathname blocks inodes
Definitions for each field are as follows:

pkg
The short, or abbreviated, name of a package in the set. This name describes which package of the set
requires the amount of space described by the rest of the data on this line in the setsize(F) file.
pathname
Names a directory in which there are objects that will be installed or that will require additional space.
The name can be the mountpoint for a file system. Names that do not begin with a slash (/) indicate
relocatable directories.
blocks
Defines the number of 512byte disk blocks required for installation of the files and directory entries
contained in the pathname. (Do not include filesystemdependent disk usage).
inodes
Defines the number of inodes required for the installation of the files and directory entries contained
in the pathname.

Set removal

228

Packaging your software applications


At installation time, the set installation calls setsizecvt(C), which reduces the setsize file for a set to a space
file containing entries for only the packages that are selected. It is this resulting space file against which space
checking for the set is performed.

The setsizecvt command


The setsizecvt(C) command generates files in the space(F) format for sets. Before sets were included as
packaging objects, the installation tools used space files to specify any additional space the packages required,
in addition to that listed in the entries in that package's pkgmap(F) file.
The setsizecvt command was designed to work as simply as possible; the packaging tools process the sets in
much the same way they process packages.
Executing in a set's installation directory, setsizecvt collects the space file (if it exists) and the setsize(F) file
from each of the packages included in that set. The setsize file is a file whose entries are formatted as follows:
pkg:/path/name #blks #inodes
where pkg is the short form of the package name, and the rest is the directory and number of blocks and
inodes used in that directory. This setsize file is created when the sets are created. Typically, the setsize file for
a given set would be created from the pkgmap files for all of the packages in that set.
setsizecvt(C) selects those entries in the setsize file for packages (in the current set) that the user wants to
install. Those entries are then collected in a new file called space.
pkgadd(ADM) uses the space file to see if there is enough space on the disk to install the set. The space file
for a set is treated the same way as it is in a package.

Quick reference to packaging procedures


Before beginning any packaging procedure, you must first have planned your packaging needs based on the
information presented in ``Basic steps of packaging''. This section covers only the required steps.
1. Create a prototype(F) file.
Create one manually using any editor. There must be one entry for every package component.
The format for a prototype(F) file entry is:
[volno] ftype class pathname [major minor] [mode owner group] [mac fixed inherited]
volno designates the medium volume number on which the object should be placed. If no
volno is given, pkgmk(C) distributes package components across volumes automatically.
ftype must be one of these object file types:
f (standard executable or data file)
e (file to be edited upon installation or removal)
v (volatile file, contents will change)
d (directory)
x (exclusive directory)
l (linked file)
The setsizecvt command

229

Packaging your software applications


p (named pipe)
c (character special device)
b (block special device)
i (installation script or package information file)
s (symbolic link)
class defines the class to which the object belongs. Place an object into the class of none if no
special handling is required.
pathname defines the pathname of an object. It can be in one of these formats:
fixed pathname: /src/myfile
collectively relocatable pathname: src/myfile (no beginning slash)
individually relocatable pathname: $BIN/myfile
This pathname defines where the component should reside on the installation medium and
also tells pkgmk(C) where to find it on your machine. If these names differ, use the
path1``=''path2 format for pathname, where path1 is the name it should have on the
installation machine and path2 is the name it has on your machine.
major minor defines the major and minor numbers for a block or character special device.
mode owner group defines the mode, owner and group for the object. If not defined, the
value of the default command is used. If no default value is defined, 644 root other is
assigned.
You can use four types of command lines in a prototype(F) file:
search pathnames (defines a search path for pkgmk(C) to use when creating your package)
include filename (nests prototype(F) files)
default mode owner group (defines a default mode owner group for objects defined in this
prototype(F) file)
param=value (defines parameter values for pkgmk(C))
All command lines must begin with an exclamation point (``!'').
Create one using pkgproto(C).
pkgproto [i] [c class] [path1[=path2] . . . ] > filename

where i tells pkgproto(C) to record symbolic links with an ftype of f (not s), c defines the
class of all objects as class, and path1 defines the object pathname (or names) to be included
in the prototype file. If path1 is a directory, entries for all objects in that directory will be
generated.
Use the path1=path2 format to give an object a different pathname in the prototype(F) file
than it has on your machine. path1 is the pathname where objects can be located on your
machine and path2 is the pathname that should be substituted for those objects.

The setsizecvt command

230

Packaging your software applications


pkgproto(C) writes its output to the standard output. To create a file, you should redirect the
output to a file. That file can be named prototype (although it is not required).
2. Create a pkginfo file.
Use any editor. Define one entry per line per parameter in this format:
PARAM="value"

where PARAM is the name of one of the standard installation parameters defined in the pkginfo(F)
manual page and value is the value you assign to it.
You can also define values for your own installation parameters using the same format. Names for
parameters that you create must begin with an uppercase letter and be followed by only lowercase
letters.
The following five parameters are required in every pkginfo file: PKG, NAME, ARCH, VERSION,
and CATEGORY. No other restrictions apply concerning which parameters or how many parameters
you define.
The CLASSES parameter dictates which classes are installed and the order of installation. Although
the parameter is not required, no classes will be installed without it. Even if you have no class action
scripts, the none class must be defined in the CLASSES parameter before objects belonging to that
class will be installed.
3. Execute pkgmk(C).
pkgmk [d device] [r rootpath] [b basedir] [f filename]

where d specifies that the package should be copied onto device, r requests that the root directory
rootpath be used to locate objects on your machine, b requests that basedir be prepended to
relocatable paths when searching for them on your machine, and f names a file, filename, to be used
as your prototype(F) file. (Other options are described in the pkgmk(C) manual page.)

Case studies of package installation


This section presents packaging case studies that show packaging techniques such as installing objects
conditionally, determining at run time how many files to create, and how to modify an existing data file
during package installation and removal.
Each case begins with a description of the study, followed by a list of the packaging techniques it uses and a
narrative description of the approach taken when using those techniques. After this material, sample files and
scripts associated with the case study are shown.

1. Selective installation
This package has three types of objects. The installer chooses which of the three types to install and where to
locate the objects on the installation machine.

Case studies of package installation

231

Packaging your software applications


Techniques
This case study shows examples of the following techniques:
using variables in object pathnames
using the request script to solicit input from the installer
setting conditional values for an installation parameter

Approach
To set up selective installation, you must:
1. Define a class for each type of object which can be installed.
In this case study, the three object types are the package executables, the manual pages, and the
emacs executables. Each type has its own class: bin, man, and emacs, respectively. Notice in the
example prototype file that all of the object files belong to one of these three classes.
2. Initialize the CLASSES parameter in the pkginfo(F) file as null.
Normally when you define a class, you want the CLASSES parameter to list all classes that will be
installed. Otherwise, no objects in that class will be installed. For this example, the parameter is
initially set to null. CLASSES will be given values by the request script based on the package pieces
chosen by the installer. This way, CLASSES is set to only those object types that the installer wants
installed. For an example, see the sample pkginfo file associated with this package. Notice that the
CLASSES parameter is set to null.
3. Define object pathnames in the prototype(F) file with variables.
These variables will be set by the request script to the value which the installer provides.
pkgadd(ADM) resolves these variables at installation time and so knows where to install the package.
The three variables used in this example are:
$NCMPBIN (defines location for object executables)
$NCMPMAN (defines location for manual pages)
$EMACS (defines location for emacs executables)
Look at the example prototype file to see how to define the object pathnames with variables.
4. Create a request script to ask the installer which parts of the package should be installed and where
they should be placed.
The request script for this package asks two questions:
Should this part of the package be installed?
When the answer is yes, then the appropriate class name is added to the CLASSES
parameter. For example, when the question ``Should the manual pages associated with this
package be installed'' is answered yes, the class man is added to the CLASSES parameter.
If so, where should that part of the package be placed?

1. Selective installation

232

Packaging your software applications


The appropriate variable is given the value of the response to this question. In the manual
page example, the variable $NCMPMAN is set to this value.
These two questions are repeated for each of the three object types.
At the end of the request script, the parameters are made available to the installation environment for
pkgadd(ADM) and any other packaging scripts. In the case of this example, no other scripts are
provided.
When looking at the request script for this example, notice that the questions are generated by the
data validation tools ckyorn and ckpath.

pkginfo file
PKG='ncmp'
NAME='NCMP Utilities'
CATEGORY='applications,tools'
ARCH='3b2'
VERSION='Release 1.0, Issue 1.0'
CLASSES=''

prototype file
i
i
x
f
f
f
f
x
f
f
f
f
f
f
d
d
f
f
f

pkginfo
request
bin $NCMPBIN 0755 root other
bin $NCMPBIN/dired=/usr/ncmp/bin/dired 0755 root other
bin $NCMPBIN/less=/usr/ncmp/bin/less 0755 root other
bin $NCMPBIN/ttype=/usr/ncmp/bin/ttype 0755 root other
emacs $NCMPBIN/emacs=/usr/ncmp/bin/emacs 0755 root other
emacs $EMACS 0755 root other
emacs $EMACS/ansii=/usr/ncmp/lib/emacs/macros/ansii 0644 root other
emacs $EMACS/box=/usr/ncmp/lib/emacs/macros/box 0644 root other
emacs $EMACS/crypt=/usr/ncmp/lib/emacs/macros/crypt 0644 root other
emacs $EMACS/draw=/usr/ncmp/lib/emacs/macros/draw 0644 root other
emacs $EMACS/mail=/usr/ncmp/lib/emacs/macros/mail 0644 root other
emacs $NCMPMAN/man1/emacs.1=/usr/ncmp/man/man1/emacs.1 0644 root other
man $NCMPMAN 0755 root other
man $NCMPMAN/man1 0755 root other
man $NCMPMAN/man1/dired.1=/usr/ncmp/man/man1/dired.1 0644 root other
man $NCMPMAN/man1/ttype.1=/usr/ncmp/man/man1/ttype.1 0644 root other
man $NCMPMAN/man1/less.1=/usr/ncmp/man/man1/less.1 0644 inixmr other

request script
trap 'exit 3' 15

# determine if and where general executables should be placed ans=`ckyorn d y \ p "Should executables
included in this package be installed" ` || exit $? if [ "$ans" = y ] then CLASSES="$CLASSES bin"
NCMPBIN=`ckpath d /usr/ncmp/bin aoy \ p "Where should executables be installed" ` || exit $? fi

1. Selective installation

233

Packaging your software applications


# determine if emacs editor should be installed, and if it should # where should the associated macros be
placed ans=`ckyorn d y \ p "Should emacs editor included in this package be installed" ` || exit $? if [
"$ans" = y ] then CLASSES="$CLASSES emacs" EMACS=`ckpath d /usr/ncmp/lib/emacs aoy \ p
"Where should emacs macros be installed" ` || exit $? fi
# determine if and where manual pages should be installed ans=`ckyorn \ d y \ p "Should manual pages
associated with this package be installed" ` || exit $? if [ "$ans" = y ] then CLASSES="$CLASSES man"
NCMPMAN=`ckpath d /usr/ncmp/man aoy \ p "Where should manual pages be installed" ` || exit $? fi
# make parameters available to installation service, # and so to any other packaging scripts cat >$1 <<!
CLASSES='$CLASSES' NCMPBIN='$NCMPBIN' EMACS='$EMACS' NCMPMAN='$NCMPMAN' !
exit 0

2. Device driver installation


For a discussion of device driver installation, see UNRESOLVED XREF0.

3. Create an installation database


This study creates a database file at the time of installation and saves a copy of the database when the package
is removed.
Techniques
This case study shows examples of the following techniques:
using classes and class action scripts to perform special actions on different sets of objects
using the space(F) file to inform pkgadd(ADM) that extra space will be required to install this
package properly
using the installf(ADM) command

Approach
To create a database file at the time of installation and save a copy on removal, you must:
1. Create three classes.
This package requires three classes:
the standard class of none (contains a set of processes belonging in the subdirectory bin)
the admin class (contains an executable file config and a directory containing data files)
the cfgdata class (contains a directory)
2. Make the package collectively relocatable.
Notice in the sample prototype(F) file that none of the pathnames begin with a slash or a variable.
This indicates that they are collectively relocatable.
3. Calculate the amount of space the database file will require and create a space(F) file to deliver with
the package. This file notifies pkgadd(ADM) that this package requires extra space and how much
2. Device driver installation

234

Packaging your software applications


extra space. For an example, see the sample space file for this package.
4. Create an installation class action script for the admin class.
The sample script initializes a database using the data files belonging to the admin class. To perform
this task, it:
copies the source data file to its proper destination
creates an empty file named config.data and assigns it to a class of cfgdata
executes the bin/config command (delivered with the package and already installed) to
populate the database file config.data using the data files belonging to the admin class
executes installf f to finalize installation
No special action is required for the admin class at removal time so no removal class action script is
created. This means that all files and directories in the admin class will simply be removed from the
system.
5. Create a removal class action script for the cfgdata class.
The sample removal script makes a copy of the database file before it is deleted during package
removal. No special action is required for this class at installation time, so no installation class action
script is needed.
Remember that the input to a removal script is a list of pathnames to remove. Pathnames always
appear in lexical order with the directories appearing first. This script captures directory names so that
they can be acted upon later and copies any files to a directory named /tmp. When all of the
pathnames have been processed, the script then goes back and removes all directories and files
associated with the cfgdata class.
The outcome of this removal script is to copy config.data to /tmp and then remove the config.data file
and the data directory.

pkginfo file
PKG='krazy'
NAME='KrAzY Applications'
CATEGORY='applications'
ARCH='3b2'
VERSION='Version 1'
CLASSES='none cfgdata admin'

prototype file
i
i
i
d
f
f
f
f
d
f
f

pkginfo
i.admin
r.cfgdata
none bin 555 root sys
none bin/process1 555 root other
none bin/process2 555 root other
none bin/process3 555 root other
none bin/config 500 root sys
admin cfg 555 root sys
admin cfg/datafile1 444 root sys
admin cfg/datafile2 444 root sys

3. Create an installation database

235

Packaging your software applications


f admin cfg/datafile3 444 root sys
f admin cfg/datafile4 444 root sys
d cfgdata data 555 root sys

space file
# extra space required by config data which is
# dynamically loaded onto the system
data 500 1

Installation class action script (i.admin)


# PKGINST parameter provided by installation service
# BASEDIR parameter provided by installation service

while read src dest do # the installation service provides '/dev/null' as the # pathname for directories, pipes,
special devices, etc # which it knows how to create [ "$src" = /dev/null ] && continue
cp $src $dest || exit 2 done
# if this is the last time this script will # be executed during the installation, do additional # processing here if
[ "$1" = ENDOFCLASS ] then # our config process will create a data file based on any changes # made by
installing files in this class; make sure # the data file is in class 'cfgdata' so special rules can apply # to it
during package removal installf c cfgdata $PKGINST $BASEDIR/data/config.data f 444 root sys || exit 2
$BASEDIR/bin/config > $BASEDIR/data/config.data || exit 2 installf f c cfgdata $PKGINST || exit 2 fi exit
0

Removal class action script (r.cfgdata)


# the product manager for this package has suggested that
# the configuration data is so valuable that it should be
# backed up to /tmp before it is removed!

while read path do # pathnames appear in lexical order, thus directories # will appear first; you cannot operate
on directories # until done, so just keep track of names until # later if [ d $path ] then dirlist="$dirlist $path"
continue fi mv $path /tmp || exit 2 done if [ n "$dirlist" ] then rm rf $dirlist || exit 2 fi exit 0

4. Define package compatibilities and dependencies


This package uses the optional packaging files to define package compatibilities and dependencies, and to
present a copyright message during installation.
Techniques
This case study shows examples of the following techniques:
using the copyright(F) file
using the compver(F) file
3. Create an installation database

236

Packaging your software applications


using the depend(F) file

Approach
To meet the requirements in the description, you must:
1. Create a copyright(F) file.
A copyright file contains the ASCII text of a copyright message. The message shown in the sample
file will be displayed on the screen during package installation (and also during package removal).
2. Create a compver(F) file.
The sample pkginfo file defines this package version as version 3.0. The sample compver file defines
version 3.0 as being compatible with versions 2.3, 2.2, 2.1, 2.1.1, 2.1.3 and 1.7.
3. Create a depend(F) file.
Files listed in a depend file must already be installed on the system when a package is installed. The
sample file has 11 packages which must already be on the system at installation time.

pkginfo file
PKG='case4'
NAME='Case Study 4'
CATEGORY='application'
ARCH='3b2'
VERSION='Version 3.0'
CLASSES='none'

copyright file
Copyright (c) 1997 The Santa Cruz Operation, Inc.
All Rights Reserved.

THIS PACKAGE CONTAINS UNPUBLISHED PROPRIETARY SOURCE CODE OF SCO.


The copyright notice above does not evidence any actual or intended publication of such source code.

compver file
Version
Version
Version
Version
Version
Version

2.3
2.2
2.1
2.1.1
2.1.3
1.7

4. Define package compatibilities and dependencies

237

Packaging your software applications


depend file
P acu
P cc
P dfm
P ed
P esg
P graph
P rx
P sgs
P shell
P sys

Advanced C Utilities
Issue 4 Version 1
C Programming Language
Issue 4 Version 1
Directory and File Management Utilities
Editing Utilities
Extended Software Generation Utilities
Issue 4 Version 1
Graphics Utilities
Remote Execution Utilities
Software Generation Utilities
Issue 4 Version 1
Shell Programming Utilities
System Header Files
Release 3.1

5a. Modify an existing file using the sed class


This study modifies a file which exists on the installation machine during package installation. It uses one of
three modification methods. The other two methods are shown in case study 5b and case study 5c. The file
modified is /etc/inittab.
Techniques
This case study shows examples of the following techniques:
using the sed class
using a postinstall script

Approach
To modify /etc/inittab at the time of installation, you must:
1. Add the sed class script to the prototype(F) file.
The name of a script must be the name of the file that will be edited. in this case, the file to be edited
is /etc/inittab and so the sed(C) script is named /etc/inittab. There are no requirements for the mode
owner group of a sed script (represented in the sample prototype by question marks). The file type of
the sed script must be e (indicating that it is editable). For an example, see the "sample prototype file"
.
NOTE: Because the pathname of the sed class action script is exactly the same as the file it is
intended to edit, these two cannot coexist in the same package.
2. Set the CLASSES parameter to include sed.
In the case of the sample pkginfo file, sed is the only class being installed. However, it could be one
of any number of classes.
3. Create a sed class action script.

4. Define package compatibilities and dependencies

238

Packaging your software applications


You cannot deliver a copy of /etc/inittab that looks the way you need for it to, because /etc/inittab has
already been installed and is a dynamic file. Because of this, you have no way of knowing how it will
look at the time of package installation. Using a sed script allows you to modify the /etc/inittab file
during package installation.
As already mentioned, the name of a sed script should be the same as the name of the file it will edit.
A sed script, including the example, contains sed commands to remove and add information to the
file.
4. Create a postinstall script.
You need to inform the system that /etc/inittab has been modified by executing init q. The only place
you can perform that action in this example is in a postinstall script. Looking at the example
postinstall script, you see that its only purpose is to execute init q.
This approach to editing /etc/inittab during installation has two drawbacks. First, you have to deliver a full
script (the postinstall script) simply to perform init q. In addition, the package name at the end of each
comment line is hardcoded. It would be nice if this value could be based on the package instance so that you
could distinguish between the entries you add for each package.
pkginfo file
PKG='case5a'
NAME='Case Study 5a'
CATEGORY='applications'
ARCH='3b2'
VERSION='Version 1d05'
CLASSES='sed'

prototype file
i pkginfo
i postinstall
e sed /etc/inittab=/home/mypkg/inittab.sed ? ? ?

sed script (/home/mypkg/inittab.sed)


!remove
# remove all entries from the table that are associated
# with this package, though not necessarily just
# with this package instance
/^[^:]*:[^:]*:[^:]*:[^#]*#ROBOT$/d

!install # remove any previous entry added to the table # for this particular change
/^[^:]*:[^:]*:[^:]*:[^#]*#ROBOT$/d
# add the needed entry at the end of the table; # sed(C) does not properly interpret the '$a' # construct if you
previously deleted the last # line, so the command # $a\ # rb:023456:wait:/usr/robot/bin/setup #ROBOT # will
not work here if the file already contained # the modification. Instead, you will settle for # inserting the entry
before the last line! $i\ rb:023456:wait:/usr/robot/bin/setup #ROBOT

5a. Modify an existing file using the sed class

239

Packaging your software applications


postinstall script
# make init reread inittab
/sbin/init q ||
exit 2
exit 0

5b. Modify an existing file using a class action script


This study modifies a file which exists on the installation media during package installation. It uses one of
three modification methods. The other two methods are shown in case study 5a and case study 5c. The file
modified is /etc/inittab.
Techniques
This case study shows examples of the following techniques:
creating classes
using installation and removal class action scripts

Approach
To modify /etc/inittab during installation, you must:
1. Create a class.
Create a class called inittab. You must provide an installation and a removal class action script for this
class. Define the inittab class in the CLASSES parameter in the sample pkginfo file.
2. Create an inittab(F) file.
This file contains the information for the entry that you will add to /etc/inittab. Notice in the sample
prototype file that inittab is a member of the inittab class and has a file type of e for editable. The
sample inittab file upon which this is based is also shown.
3. Create an installation class action script.
Because class action scripts must be multiply executable (you get the same results each time they are
executed), you cannot just add our text to the end of the file. The sample class action script, performs
the following procedures:
checks to see if this entry has been added before
if it has, removes any previous versions of the entry
edits the inittab(F) file and adds the comment lines so you know where the entry is from
moves the temporary file back into /etc/inittab
executes init q when it receives the endofclass indicator
Note that init q can be performed by this installation script. A oneline postinstall script is not needed
by this approach.
4. Create a removal class action script.

5a. Modify an existing file using the sed class

240

Packaging your software applications


The sample removal script is very similar to the installation script. The information added by the
installation script is removed and init q is executed.
This case study resolves the drawbacks to case study 5a. You can support multiple package instances because
the comment at the end of the inittab entry is now based on package instance. Also, you no longer need a
oneline postinstall script. However, this case has a drawback of its own. You must deliver two class action
scripts and the inittab(F) file to add one line to a file. Case 5c shows a more streamlined approach to editing
/etc/inittab during installation.
pkginfo file
PKG='case5b'
NAME='Case Study 5b'
CATEGORY='applications'
ARCH='3b2'
VERSION='Version 1d05'
CLASSES='inittab'

prototype file
i
i
i
e

pkginfo
i.inittab
r.inittab
inittab /etc/inittab ? ? ?

Installation class action script (i.inittab)


# PKGINST parameter provided by installation service

while read src dest do # remove all entries from the table that are # associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" $dest > /tmp/$$itab || exit 2
sed e "s/$/#$PKGINST" $src >> /tmp/$$itab || exit 2
mv /tmp/$$itab $dest || exit 2 done if [ "$1" = ENDOFCLASS ] then /sbin/init q || exit 2 fi exit 0

Removal class action script (r.inittab)


# PKGINST parameter provided by installation service

while read src dest do # remove all entries from the table that # are associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" $dest > /tmp/$$itab || exit 2
mv /tmp/$$itab $dest || exit 2 done /sbin/init q || exit 2 exit 0

5b. Modify an existing file using a class action script

241

Packaging your software applications


inittab file
rb:023456:wait:/usr/robot/bin/setup

5c. Modify an existing file using the build class


This study modifies a file which exists on the installation machine during package installation. It uses one of
three modification methods. The other two methods are shown in case study 5a and case study 5b. The file
modified is /etc/inittab.
Techniques
This case study shows an example of using the build class.
Approach
This approach to modifying /etc/inittab uses the build class. A build class file is executed as a shell script and
its output becomes the new version of the file for which it is named. In other words, the file inittab(F) that is
delivered with this package is executed and the output of that execution becomes /etc/inittab.
The build class file is executed during package installation and package removal. The argument install is
passed to the file if it is being executed at installation time. Notice in the sample build file that installation
actions are defined by testing for this argument.
To edit /etc/inittab using the build class, you must:
1. Define the build file in the prototype(F) file.
The prototype file entry for the build class file should be of class build and file type e. Be certain that
the CLASSES parameter in the pkginfo file includes build. See the sample pkginfo file and sample
prototype file for this example.
2. Create the build file.
The sample build file performs the following procedures:

a.
Edits /etc/inittab to remove any changes already existing for this package. Notice that the
filename /etc/inittab is hardcoded into the sed command.
b.
If the package is being installed, adds the new line to the end of /etc/inittab. A comment tag is
included in this new entry to remind us from where that entry came.
c.
Executes init q.
This solution addresses the drawbacks in case study 5a and case study 5b. Only one file is needed (beyond the
pkginfo and prototype files), that file is short and simple, it works with multiple instances of a package
because the $PKGINST parameter is used, and no postinstall script is required because init q can be executed
from the build file.
5b. Modify an existing file using a class action script

242

Packaging your software applications


pkginfo file
PKG='case5c'
NAME='Case Study 5c'
CATEGORY='applications'
ARCH='3b2'
VERSION='Version 1d05'
CLASSES='build'

prototype file
i pkginfo
e build /etc/inittab=/home/case5c/inittab.build ? ? ?

build script (/home/case5c/inittab.build)


# PKGINST parameter provided by installation service

# remove all entries from the existing table that # are associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" /etc/inittab || exit 2
if [ "$1" = install ] then # add the following entry to the table echo "rb:023456:wait:/usr/robot/bin/setup
#$PKGINST" || exit 2 fi /sbin/init q || exit 2 exit 0

6. Modify crontab files during installation


This case study modifies a number of crontab files during package installation.
Techniques
This case study shows examples of the following techniques:
using classes and class action scripts
using the crontab(C) command within a class action script

Approach
You could use the build class and follow the approach shown for editing /etc/inittab in case study 5c except
that you want to edit more than one file. If you used the build class approach, you would need to deliver one
for each crontab file edited. Defining a cron class provides a more general approach. To edit a crontab file
with this approach, you must:
1. Define the crontab files that will be edited in the prototype(F) file.
Create an entry in the prototype(F) file for each crontab file which will be edited. Define their class as
cron and their file type as e. Use the actual name of the file to be edited, as shown in the example.
2. Create the crontab files that will be delivered with the package.

5c. Modify an existing file using the build class

243

Packaging your software applications


These files contain the information you want added to the existing crontab files of the same name. See
the sample root and sys files for two examples.
3. Create an installation class action script for the cron class.
The i.cron script performs the following procedures:

a.
Calculates the user ID.
This is done by setting the variable user to the base name of the cron class file being
processed. That name equates to the user ID. For example, the basename of
/var/spool/cron/crontabs/root is root (which is also the user ID).
b.
Executes crontab using the user ID and the l option.
Using the l options tells crontab to send the standard output the contents of the crontab for
the defined user.
c.
Pipes the output of the crontab command to a sed(C) script that removes any previous entries
that have been added using this installation technique.
d.
Puts the edited output into a temporary file.
e.
Adds the data file for the root user ID (that was delivered with the package) to the temporary
file and adds a tag so that you will know from where these entries came.
f.
Executes crontab with the same user ID and give it the temporary file as input.
4. Create a removal class action script for the cron class.
The sample removal script is the same as the installation script except that there is no procedure to
add information to the crontab file.
These procedures are performed for every file in the cron class.
pkginfo file
PKG='case6'
NAME='Case Study 6'
CATEGORY='application'
ARCH='3b2'
VERSION='Version 1.0'
CLASSES='cron'

6. Modify crontab files during installation

244

Packaging your software applications


prototype file
i
i
i
e
e

pkginfo
i.cron
r.cron
cron /var/spool/cron/crontabs/root ? ? ?
cron /var/spool/cron/crontabs/sys ? ? ?

Installation class action script (i.cron)


# PKGINST parameter provided by installation service

while read src dest do user=`basename $dest` || exit 2


(crontab l $user | sed e "/#$PKGINST$/d" > /tmp/$$crontab) || exit 2
sed e "s/$/#$PKGINST/" $src >> /tmp/$$crontab || exit 2
crontab $user < /tmp/$$crontab || exit 2 rm f /tmp/$$crontab done exit 0

Removal class action script (r.cron)


# PKGINST parameter provided by installation service

while read path do user=`basename $path` || exit 2


(crontab l $user | sed e "/#$PKGINST$/d" > /tmp/$$crontab) || exit 2
crontab $user < /tmp/$$crontab || exit 2 rm f /tmp/$$crontab done exit 0

root crontab file (delivered with package)


41,1,21 * * * * /usr/lib/uucp/uudemon.hour > /dev/null
45 23 * * * ulimit 5000; /usr/bin/su uucp c "/usr/lib/uucp/uudemon.cleanup" >
/dev/null 2>&1
11,31,51 * * * * /usr/lib/uucp/uudemon.poll > /dev/null

sys crontab file (delivered with package)


0 * * * 06 /usr/lib/sa/sa1
20,40 817 * * 15 /usr/lib/sa/sa1
5 18 * * 15 /usr/lib/sa/sa2 s 8:00 e 18:01 i 1200 A

7a. Create a Set Installation Package


This case study shows an example of creating a Set Installation Package (SIP) that is used to control the
installation of a set of packages.
6. Modify crontab files during installation

245

Packaging your software applications


Techniques
This case study shows examples of the following:
creating a setinfo(F) file.
creating a request script that processes set member packages selection and type of installation
(custom and default, if applicable)
including the set member packages' request and default response files in the prototype(F) file as file
type i files, if any, as part the SIP so that all interaction with the installer is done only during SIP
processing
using the preinstall script to pass the selected packages back to pkgadd(ADM)

Approach
1. Create a request script script to ask the installer how the set should be installed.
2. Should default installation be performed on this set?
When the answer is yes, if any of the set's member packages require interaction and if default
responses for that interaction have been provided, install the set using the default responses.
When the answer is no, for each package in the set, prompt as to whether this package should be
installed.
When the answer is yes, if it is interactive (the package has a request script), should default
installation of the package be performed?
If yes, use the default response file.
If no, execute the package's request script to obtain the responses.
3. When the request script has completed, the PKGLIST variable should contain the list of selected set
member packages that will be installed on the system. At this time:
pkgadd(ADM) runs the SIP's preinstall script which places the selected set member
packages for installation ($PKGLIST) into the setlist file referenced using the $SETLIST
variable.
pkgadd then reads the setlist file and inserts the packages listed there into the list of packages
to be installed on the system.
As each of these packages is processed, if the package is interactive, pkgadd uses the
response file created earlier so that no prompting for user input occurs except during SIP
processing.

setinfo file
# Format for the setinfo file. Field separator is: <tab>
# pkg
parts default category
package_name
# abbr
y/n

pkgw 1 y system Package W pkgx 1 y system Package X pkgy 2 n system Package Y pkgz 1 y system
Package Y
7a. Create a Set Installation Package

246

Packaging your software applications


prototype file
#
i
i
i
i
i

set packaging files


pkginfo
preinstall
request
setinfo
copyright

i pkgw/request=pkgw.request i pkgw/response=pkgw.response i pkgx/request=pkgx.request i


pkgx/response=pkgx.response

preinstall script file


for PKG in $PKGLIST
do
echo "$PKG" >>$SETLIST
done

request script file


# If <DELETE> is pressed, make sure we exit 77 so pkgadd knows
# no packages were selected for installation. In this case,
# pkgadd will also not install the SIP itself.
trap 'EXITCODE=77; exit' 2
trap 'exit $EXITCODE' 0

while read pkginst parts default category package_name do echo $pkginst >>/tmp/order$$ if [ "$default" =
"y" ] then echo $pkginst >>/tmp/req$$ else echo $pkginst >>/tmp/opt$$ fi done <$SETINFO
REQUIRED=`cat /tmp/req$$ 2>/dev/null` OPTIONAL=`cat /tmp/opt$$ 2>/dev/null` ORDER=`cat
/tmp/order$$ 2>/dev/null` rm f /tmp/opt$$ /tmp/req$$ /tmp/order$$ HELPMSG="Enter 'y' to run default set
installation or enter 'n' to run custom set installation."
PROMPT="Do you want to run default set installation?"
ANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"`|| exit $?
if [ "$ANS" = "y" ] then # Default installation for PKG in $REQUIRED do PKGLIST="$PKGLIST $PKG" if
[ f $REQDIR/$PKG/response ] then cp $REQDIR/$PKG/response $RESPDIR/$PKG fi done echo
"PKGLIST=$PKGLIST" >> $1 else # Custom installation of required packages for PKG in $REQUIRED do
PKGLIST="$PKGLIST $PKG" if [ f $REQDIR/$PKG/request ] then PROMPT="Do you want default
installation for $PKG?" RANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"` || exit $? if [ "$RANS" =
"y" ] then cp $REQDIR/$PKG/request $RESPDIR/$PKG else sh $REQDIR/$PKG/request $RESPDIR/$PKG
fi fi done
# Select which optional packages in set are to be installed for PKG in $OPTIONAL do HELPMSG="Enter 'y'
to install $PKG as part of this set installation or 'n' to skip installation." PROMPT="Do you want to install
$PKG?" PANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"` || exit $?

7a. Create a Set Installation Package

247

Packaging your software applications


if [ "$PANS" = "y" o "$PANS" = "" ] then PKGLIST="$PKGLIST $PKG" if [ f $REQDIR/$PKG/request ]
then PROMPT="Do you want default installation for $PKG?" RANS=`ckyorn d y p "$PROMPT" h
"$HELPMSG"` || exit $? if [ "$RANS" = "y" ] then cp $REQDIR/$PKG/request $RESPDIR/$PKG else sh
$REQDIR/$PKG/request $RESPDIR/$PKG fi fi fi done echo "PKGLIST=$PKGLIST" >> $1 fi
if [ "$PKGLIST" = "" ] then EXITCODE=77 fi export SETPKGS

7b. Split one set into two


This study shows an example of how to split one set into two new sets.
Techniques
This case study shows examples of the following:
breaking up a setinfo(F) file into two
splitting the set member packages' request and default response files in the original SIP prototype(F)
file into two

Approach
From the SIP's setinfo file
1. create two separate setinfo files for the two new sets being created
2. create two separate prototype files for the two new sets being created

Original setinfo file


# Format for the setinfo file. Field separator is: <tab>
# pkg
parts default category
package_name
# abbr
y/n

pkgw 1 y system Package W pkgx 1 y system Package X pkgy 2 n system Package Y pkgz 1 y system
Package Y

New setinfo file for SIP 1


# Format for the setinfo file. Field separator is: <tab>
# pkg
parts default category
package_name
# abbr
y/n

pkgw 1 y system Package W pkgz 1 y system Package Y

7b. Split one set into two

248

Packaging your software applications


New setinfo file for SIP 2
# Format for the setinfo file. Field separator is: <tab>
# pkg
parts default category
package_name
# abbr
y/n

pkgx 1 y system Package W pkgz 1 n system Package Y

Original prototype file


#
i
i
i
i
i

set packaging files


pkginfo
preinstall
request
setinfo
copyright

i pkgw/request=pkgw.request i pkgx/response=pkgx.response i pkgy/request=pkgy.request i


pkgz/response=pkgz.response

New prototype file for SIP 1


#
i
i
i
i
i

set packaging files


pkginfo
preinstall
request
setinfo
copyright

i pkgw/request=pkgw.request i pkgz/response=pkgz.response

New prototype file for SIP 2


#
i
i
i
i
i

set packaging files


pkginfo
preinstall
request
setinfo
copyright

i pkgx/request=pkgx.request i pkgy/response=pkgy.response

7b. Split one set into two

249

S-ar putea să vă placă și