Documente Academic
Documente Profesional
Documente Cultură
hi
Table of Contents
Programming with the SCO OpenServer system shell...................................................................................1
Shell command language.........................................................................................................................1
Filename generation..........................................................................................................................2
Special characters..............................................................................................................................5
Input and output redirection..............................................................................................................8
Executing, stopping and restarting processes..................................................................................13
Command language exercises................................................................................................................18
Answers...........................................................................................................................................19
Shell programming................................................................................................................................20
Shell programs.................................................................................................................................20
Variables..........................................................................................................................................22
Shell programming constructs.........................................................................................................29
Functions.........................................................................................................................................40
Debugging programs.......................................................................................................................42
Modifying your login environment........................................................................................................44
Adding commands to your .profile..................................................................................................44
Setting terminal options...................................................................................................................45
Using shell variables.......................................................................................................................45
Shell programming exercises.................................................................................................................47
Answers...........................................................................................................................................48
Summary of shell command language...................................................................................................51
The vocabulary of shell command language...................................................................................51
Shell programming constructs.........................................................................................................54
Programming with awk....................................................................................................................................56
Basic awk...............................................................................................................................................56
Program structure............................................................................................................................56
Usage...............................................................................................................................................57
Fields...............................................................................................................................................57
Printing............................................................................................................................................58
Formatted printing...........................................................................................................................59
Simple patterns................................................................................................................................60
Simple actions.................................................................................................................................61
A handful of useful oneliners........................................................................................................61
Error messages................................................................................................................................63
Patterns...................................................................................................................................................63
BEGIN and END.............................................................................................................................63
Relational expressions.....................................................................................................................64
Extended regular expressions..........................................................................................................65
Combinations of patterns.................................................................................................................67
Pattern ranges..................................................................................................................................68
Actions...................................................................................................................................................68
Builtin variables............................................................................................................................68
Arithmetic........................................................................................................................................69
Strings and string functions.............................................................................................................71
Field variables.................................................................................................................................74
Number or string?............................................................................................................................74
Control flow statements...................................................................................................................76
i
hi
Table of Contents
Programming with awk
Arrays..............................................................................................................................................77
Userdefined functions...................................................................................................................79
Some lexical conventions................................................................................................................80
Output....................................................................................................................................................80
The print statement..........................................................................................................................80
Output separators.............................................................................................................................80
The printf statement.........................................................................................................................81
Output to files..................................................................................................................................82
Output to pipes................................................................................................................................82
Input.......................................................................................................................................................83
Files and pipes.................................................................................................................................83
Input separators...............................................................................................................................84
Multiline records...........................................................................................................................84
The getline function.........................................................................................................................84
Commandline arguments..............................................................................................................86
Using awk with other commands and the shell.....................................................................................87
The system function........................................................................................................................87
Cooperation with the shell...............................................................................................................87
Example applications.............................................................................................................................89
Generating reports...........................................................................................................................89
Word frequencies............................................................................................................................90
Accumulation..................................................................................................................................90
Random choice................................................................................................................................91
History facility.................................................................................................................................91
Formletter generation....................................................................................................................91
awk summary.........................................................................................................................................92
Command line.................................................................................................................................92
Patterns............................................................................................................................................92
Control flow statements...................................................................................................................92
Inputoutput....................................................................................................................................93
Functions.........................................................................................................................................93
String functions...............................................................................................................................93
Arithmetic functions........................................................................................................................93
Operators (increasing precedence)..................................................................................................94
Regular expressions (increasing precedence)..................................................................................94
Builtin variables............................................................................................................................95
Limits...............................................................................................................................................95
Initialization, comparison, and type coercion.................................................................................95
Lexical analysis with lex...................................................................................................................................98
Generating a lexical analyzer program..................................................................................................98
Writing lex source..................................................................................................................................99
The fundamentals of lex rules.......................................................................................................100
Advanced lex usage.......................................................................................................................103
Using lex with yacc..............................................................................................................................110
Miscellaneous......................................................................................................................................112
Summary of source format...................................................................................................................112
ii
hi
Table of Contents
Parsing with yacc............................................................................................................................................114
Basic specifications..............................................................................................................................115
Actions...........................................................................................................................................117
Lexical analysis.............................................................................................................................119
Parser operation...................................................................................................................................120
Ambiguity and conflicts.......................................................................................................................124
Precedence...........................................................................................................................................128
Error handling......................................................................................................................................131
The yacc environment..........................................................................................................................133
Hints for preparing specifications........................................................................................................134
Input style......................................................................................................................................134
Left recursion................................................................................................................................134
Lexical tieins...............................................................................................................................135
Reserved words.............................................................................................................................136
Advanced topics...................................................................................................................................136
Simulating error and accept in actions..........................................................................................136
Accessing values in enclosing rules..............................................................................................136
Support for arbitrary value types...................................................................................................137
yacc input syntax...........................................................................................................................138
A simple example................................................................................................................................139
An advanced example..........................................................................................................................140
Managing file interactions with make...........................................................................................................144
Basic features.......................................................................................................................................144
Parallel make.................................................................................................................................147
Description files and substitutions.......................................................................................................148
Comments......................................................................................................................................148
Continuation lines..........................................................................................................................148
Macro definitions..........................................................................................................................148
General form..................................................................................................................................148
Dependency information...............................................................................................................148
Executable commands...................................................................................................................149
Extensions of $, $@, and $<.........................................................................................................149
Output translations........................................................................................................................149
Recursive makefiles......................................................................................................................150
Suffixes and transformation rules..................................................................................................150
Implicit rules..................................................................................................................................150
Archive libraries............................................................................................................................153
Source code control system file names.........................................................................................154
The null suffix...............................................................................................................................155
Included files.................................................................................................................................156
SCCS makefiles.............................................................................................................................156
Dynamic dependency parameters..................................................................................................156
The make command.............................................................................................................................156
Environment variables.........................................................................................................................158
Suggestions and warnings....................................................................................................................159
Internal rules........................................................................................................................................159
iii
hi
Table of Contents
Tracking versions with SCCS........................................................................................................................162
Basic usage...........................................................................................................................................162
Terminology..................................................................................................................................162
Creating an SCCS file with admin................................................................................................162
Retrieving a file with get...............................................................................................................163
Recording changes with delta........................................................................................................164
More on get...................................................................................................................................164
The help command........................................................................................................................165
Delta numbering...................................................................................................................................165
SCCS command conventions...............................................................................................................167
x.files and z.files............................................................................................................................167
Error messages..............................................................................................................................168
SCCS commands.................................................................................................................................168
get..................................................................................................................................................169
delta...............................................................................................................................................176
admin.............................................................................................................................................178
prs..................................................................................................................................................180
sact.................................................................................................................................................181
help................................................................................................................................................181
rmdel..............................................................................................................................................182
cdc.................................................................................................................................................182
what...............................................................................................................................................183
sccsdiff...........................................................................................................................................183
comb..............................................................................................................................................184
val..................................................................................................................................................184
SCCS files............................................................................................................................................185
Protection.......................................................................................................................................185
Formatting.....................................................................................................................................186
Auditing.........................................................................................................................................186
Packaging your software applications..........................................................................................................188
Contents of a package..........................................................................................................................188
Required components....................................................................................................................189
Optional package information files...............................................................................................189
Optional installation scripts...........................................................................................................190
Quick steps to packaging.....................................................................................................................190
Quick steps to network installation......................................................................................................194
Network installation from the command line................................................................................196
Network installation from the graphical interface.........................................................................196
The structural life cycle of a package..................................................................................................197
The package creation tools...................................................................................................................197
pkgmk............................................................................................................................................197
pkgtrans.........................................................................................................................................198
pkgproto.........................................................................................................................................198
The installation tools............................................................................................................................198
The package information files.............................................................................................................199
pkginfo...........................................................................................................................................199
prototype........................................................................................................................................200
iv
hi
Table of Contents
Packaging your software applications
compver.........................................................................................................................................202
copyright........................................................................................................................................203
depend...........................................................................................................................................203
space..............................................................................................................................................204
pkgmap..........................................................................................................................................204
The installation scripts.........................................................................................................................205
Script processing...........................................................................................................................205
Installation parameters..................................................................................................................206
Getting package information for a script.......................................................................................207
Exit codes for scripts.....................................................................................................................207
The request script..........................................................................................................................207
The class action script...................................................................................................................209
The special system classes............................................................................................................211
The procedure script......................................................................................................................213
Basic steps of packaging......................................................................................................................214
1. Assigning a package abbreviation.............................................................................................215
2. Defining a package instance......................................................................................................215
3. Placing objects into classes.......................................................................................................217
4. Making package objects relocatable..........................................................................................217
5. Writing your installation scripts................................................................................................218
6. Defining package dependencies................................................................................................219
7. Writing a copyright message.....................................................................................................219
8. Creating the pkginfo file............................................................................................................220
9. Creating the prototype file.........................................................................................................220
10. Distributing packages over multiple volumes.........................................................................224
11. Creating a package with pkgmk..............................................................................................224
12. Creating a package with pkgtrans...........................................................................................225
Set packaging.......................................................................................................................................226
Set installation...............................................................................................................................227
Set removal....................................................................................................................................228
Set information display.................................................................................................................228
The setsize file...............................................................................................................................228
The setsizecvt command...............................................................................................................229
Quick reference to packaging procedures............................................................................................229
Case studies of package installation....................................................................................................231
1. Selective installation..................................................................................................................231
2. Device driver installation..........................................................................................................234
3. Create an installation database..................................................................................................234
4. Define package compatibilities and dependencies....................................................................236
5a. Modify an existing file using the sed class..............................................................................238
5b. Modify an existing file using a class action script..................................................................240
5c. Modify an existing file using the build class...........................................................................242
6. Modify crontab files during installation....................................................................................243
7a. Create a Set Installation Package.............................................................................................245
7b. Split one set into two...............................................................................................................248
The backslash turns off the meaning of special characters such as ? [ ] & ; > < and |.
Single quotes turn off the delimiting meaning of a space and the special meaning of all special
characters.
Double quotes turn off the delimiting meaning of a space and the special meaning of all special
characters except $ and `.
The greater than sign redirects the output of a command into a file (replacing the existing
contents).
The less than sign redirects the input for a command to come from a file.
Two greater than signs redirect the output of a command to be added to the end of an existing
file.
The vertical bar, or pipe, makes the output of one command the input of another command.
A pair of grave accents around a command embedded on a command line makes the output of
the embedded command an argument on the larger command line.
The dollar sign retrieves the value of positional parameters and userdefined variables. It is also
the default shell prompt.
Filename generation
The shell recognizes three of the special characters listed in ``Characters with special meanings in the shell
language'' the asterisk (*), the question mark (?), and the set of brackets ([ ])as symbols for patterns that
are parts of filenames. By substituting one or more of these characters for the name (or partial name) of an
existing file (or group of files), you can reduce the amount of typing you must do to specify filenames on a
command line.
The process by which the shell interprets these characters as the full filenames they represent is known as
filename expansion. File name expansion is a useful mechanism when you want to specify many files on a
single command line. For example, you might want to print a group of files containing records for the month
of December, all of which begin with the letters dec. By using one of these special characters to represent the
parts of the filenames that vary, you can type one print command and specify all the files that begin with dec,
thus avoiding the need to type the full names of all the desired files on the command line.
This section explains how to use the asterisk, question mark, and brackets for filename expansion.
Matching all characters with the asterisk
The asterisk (*) matches any string of characters, including a null (empty) string. You can use the * to specify
a full or partial filename. The * alone matches all the file and directory names in the current directory, except
those starting with a . (dot). To see the effect of the *, try it as an argument to the echo(C) command. Type:
echo *
The echo command displays its arguments on your screen. Notice that the system response to echo * is a
listing of all the filenames in your current directory.
``Summary of filename generation characters'' summarizes the syntax and capabilities of the echo command.
CAUTION: The * is a character that matches everything. For example, if you type rm * you will erase all the
files in your current directory. Be very careful how you use the asterisk!
Filename generation
The * matches any characters after the string report, including no letters at all. Notice that * matches the files
in numerical and alphabetical order. A quick and easy way to display the contents of your report files in order
on your screen is by typing the following command:
pr report*
Now try another exercise. Suppose you have a current directory called appraisals that contains files called
Andrew_Adams, Paul_Lang, Jane_Peters, and Fran_Smith, choose a character that all the filenames in your
directory have in common, such as a lowercase ``a.'' Then request a listing of those files by referring to that
character. For example, if you choose a lowercase ``a,'' type the following command line:
ls *a*
The system responds by printing the names of all the files in your current directory that contain a lowercase
``a.''
The ``*'' can represent characters in any part of a filename. For example, if you know the first and last letters
are the same in several files, you can request a list of them on that basis. If, for example, you had a directory
containing files named FATE, FE, FADED_LINE, F123E, Fig3.4E, FIRE_LANE, FINE_LINE,
FREE_ENTRY, and FAST_LANE, you could use this command to obtain a list of files starting with ``F''
and ending with ``E.'' For such a request, your command line might look like this:
ls F*E
The system response will be a list of filenames that begin with F, end with E, and are in the following order:
F123E
FADED_LINE
FAST_LANE
FATE
FE
FINE_LINE
FIRE_LANE
Fig3.4E
The order is determined by the collating sequences of the language being used, in this case, English: (1)
numbers, (2) uppercase letters, (3) lowercase letters.
The ``*'' is even more powerful; it can help you find all files named memo in any directory one level below the
current directory:
ls */memo
Filename generation
use the ls command with the ``?'' to list all chapters that begin with the string ``chapter'' and end with any
single character, as shown below:
$ ls chapter?
chapter1 chapter2 chapter5 chapter9
$
Of course, if you want to list all the chapters in the current directory, use the ``*'' (asterisk):
ls chapter*
This command displays all filenames that begin with the letter ``c,'' ``r,'' or ``f,'' and end with the letters ``at.''
Characters that can be grouped within brackets in this way are collectively called a ``character class.''
Brackets can also be used to specify a range of characters, whether numbers or letters. Suppose you have a
directory containing the following files: chapter1, chapter2, chapter3, chapter4, chapter5 and chapter6. If
you specify
chapter[15]
the shell will match the files named chapter1 through chapter5. This is an easy way to handle only a few
chapters at a time.
Filename generation
This command displays the contents of chapter2, chapter3, and chapter4, in that order, on your terminal.
A character class may also specify a range of letters. If you specify [AZ], the shell will look only for
uppercase letters; if [az], only lowercase letters.
The functions of these special characters are summarized in ``Summary of filename generation characters''.
Try to use them on the files in your current directory.
Summary of filename generation characters
Character Function
Match any string of characters (including an empty, or null string) except a leading period.
?
Match any single character, except a leading period.
[xyz]
Match one of the characters specified within the brackets.
[az]
Match one of the range of characters specified.
Special characters
The shell language has other special characters that perform a variety of useful functions. Some of these
additional special characters are discussed in this section; others are described in ``Input and output
redirection''.
Running a command in background with the ampersand
Some shell commands take a long time to execute. The ampersand (``&'') is used to execute commands in
background mode, thus freeing your terminal for other tasks. The general format for running a command in
background mode is
command &
NOTE: You should not run interactive shell commands, such as read, in the background.
In the example below, the shell is performing a long search in background mode. Specifically, the grep(C)
command is searching for the string ``delinquent'' in the file accounts. Notice the ``&'' is the last character of
the command line:
$ grep delinquent accounts &
21940
$
When you run a command in the background, the SCO OpenServer system outputs a process number; 21940
is the process number associated with the grep command in the example. You can use this number to
terminate the execution of a background command. (Stopping the execution of processes is discussed in
``Executing, stopping and restarting processes''.) The prompt on the last line means that the terminal is free
and waiting for your commands; grep has started running in background mode.
Special characters
The SCO OpenServer system executes the commands in the order that they appear in the line and prints all
output on the screen. This process is called sequential execution.
Try this exercise to see how the ``;'' works. First, type:
cd; pwd; ls
Use the grep command to search for the asterisk in the file, as shown in the following example:
$ grep \* trial
The all * game
$
The grep command finds the * in the text and displays the line in which it appears. Without the \ (backslash),
the * would be expanded by the shell to match all filenames in the current directory.
Special characters
you could use the grep command to match the line with the three question marks as follows:
$ grep '???' trial
He really wondered why? Why???
$
the three question marks without single quotes would have been used as special characters that match all
filenames in the current directory, if any, that consist of three characters. Another example of this is, if file
trial contained the line
trial
then
grep ????? trial
grep finds the string The all and prints the line that contains it. What would happen if you did not put quotes
around that string?
The ability to escape the special meaning of a space is especially helpful when you're using the banner(C)
command. This command prints a message across a terminal screen in large, postersize letters.
Special characters
The command prints your message as a fourline banner. Now print the same message as a threeline banner.
Type:
banner happy birthday "to you"
Notice that the words to and you now appear on the same line. The space between them has lost its meaning
as a delimiter.
When is this mechanism useful? A typical example is when you want to send someonevia the mail
commanda message or file you've already created. By default, the mail command expects input from
standard input (that is, the keyboard). But suppose you have already entered the information to be sent (to a
user with the login name jim) in a file called report. Rather than retype that information, you can simply
redirect input to mail as follows:
mail jim < report
This means you can name your new output file junk, but you cannot name it temp unless you no longer want
the contents of the existing temp file.
Appending output to an existing file with the >> symbol
To keep from destroying an existing file, you can also use the double greater than symbol (``>>''), as follows:
command >> file
This appends the output of a command to the end of the file file. If file does not exist, it is created when you
use the ``>>'' symbol this way.
The following example shows how to append the output of the cat command, (described in ``Shell
programming'') to an existing file. The cat command prints the contents of the files to the standard output. If it
has no arguments, it prints its standard input to the standard output. First, the cat command is executed on
both files without output redirection to show their respective contents. Then the contents of trial2 are added
after the last line of trial1 by executing the cat command on trial2 and redirecting the output to trial1.
$ cat trial1
This is the first line of trial1.
Hello.
This is the last line of trial1.
$
$ cat trial2
This is the beginning of trial2.
Hello.
This is the end of trial2.
$
$ cat trial2 >> trial1
$ cat trial1
This is the first line of trial1.
The spell program compares every word in a file against its internal vocabulary list and prints a list of all
potential misspellings on the screen. If spell does not have a listing for a word (such as a person's name), it
will report that as a misspelling, too.
Running spell on a lengthy text file can take a long time and may produce a list of misspellings that is too
long to fit on your screen. spell prints all its output at once; if it does not fit on the screen, the command
scrolls it continuously off the top until it has all been displayed. A long list of misspellings will roll off your
screen quickly and may be difficult to read.
You can avoid this problem by redirecting the output of spell to a file. In the following example, spell
searches a filenamed memo and places a list of misspelled words in a filenamed misspell:
$ spell memo > misspell
See the spell(C) manual page for all available options and an explanation of the capabilities of each.
The sort command
The sort command arranges the lines of a specified file in alphabetical or numerical order Because users
generally want to keep a file that has been alphabetized, output redirection greatly enhances the value of the
sort command.
Be careful to choose a new name for the file that will receive the output of the sort command (the
alphabetized list). When sort is executed, the shell first empties the file that will accept the redirected output.
Then it performs the sort and places the output in the blank file. If you type
sort list > list
the shell will empty list and then sort nothing into list.
Combining background mode and output redirection
Running a command in background does not affect the output of the command; unless it is redirected, output
is always printed on the terminal screen. If you are using your terminal to perform other tasks while a
command runs in background, you will be interrupted when the command displays its output on your screen.
However, if you redirect that output to a file, you can work undisturbed, except when an error occurs.
10
You can then use your terminal for other work and examine testfile when you have finished it.
Redirecting output to a command with the pipe
The ``|'' character is called a pipe. Pipes are powerful tools that allow you to take the output of one command
and use it as input for another command without creating temporary files. A multiple command line created in
this way is called a pipeline.
The general format for a pipeline is:
command1 | command2 | command3 . . .
The output of command1 is used as the input of command2. The output of command2 is then used as the
input for command3.
To understand the efficiency and power of a pipeline, consider the contrast between two methods that achieve
the same results.
To use the input/output redirection method, run one command and redirect its output to a temporary
file. Then run a second command that takes the contents of the temporary file as its input. Finally,
remove the temporary file after the second command has finished running.
To use the pipeline method, run one command and pipe its output directly into a second command.
For example, suppose you want to mail a happy birthday message in a banner to the owner of the login david.
Doing this without a pipeline is a threestep procedure. You must:
1. Enter the banner command and redirect its output to a temporary file:
banner happy birthday > message.tmp
11
12/26
7/4
10/18
11/9
4/23
8/12
The birthdays appear between the ninth and thirteenth spaces on each line. To display them, type:
cut c213 birthdays
The cut command is usually executed on a file; however, piping makes it possible to run this command on the
output of other commands, too. This is useful if you want only part of the information generated by another
command. For example, you may want to have the time printed. The date command prints the day of the
week, date, and time, as follows:
$ date
Tue Dec 24 13:12:32 EST 1991
$
Notice that the time is given between spaces 12 and 19 of the line. You can display the time (without the date)
by piping the output of date into cut, specifying spaces 1219 with the c option. Your command line and its
output will look like this:
$ date | cut c1219
13:14:56
$
See the date(C) manual page for all available options and an explanation of the capabilities of each.
Substituting output for an argument
The output of most commands may be captured and used as arguments on a command line. Do this by
enclosing the command in grave accents (` . . . `) and placing it on the command line in the position where the
output should be treated as arguments. This is known as command substitution.
For example, you can substitute the output of the date and cut pipeline command used previously for the
argument in a banner printout by typing the following command line:
12
Notice the results: the system prints a banner with the current time.
``Shell programming'' shows you how you can also use the output of a command line as the value of a
variable.
If there is only one command to be run with batch, you can enter it as follows:
batch command_line
<CTRLd>
The next example uses batch to execute the grep command at a convenient time. Here grep searches all files
in the current directory and redirects the output to the file dol.file.
$ batch
grep dollar * > dol.file
13
After you submit a job with batch, the system responds with a job number, date, and time. This job number is
not the same as the process number that the system generates when you run a command in the background.
The at command allows you to specify an exact time to execute the commands. The general format for the at
command is:
at time<
first command
.
.
.
last command
<CTRLd>
The time argument consists of the time of day and, if the date is not today, the date.
The following example shows how to use the at command to mail a happy birthday banner to the user with
the login name emily on her birthday:
$ at 8:15am Feb 27<
banner happy birthday | mail emily
<CTRLd>
job 453400603.a at Sat Feb 23 08:15:00 1991
$
Notice that the at command, like the batch command, responds with the job number, date, and time.
If you decide you do not want to execute the commands currently waiting in a batch or at job queue, you can
erase those jobs by using the r option of the at command with the job number or you can save the job
number by redirecting it. The general format is
at r job_number
Try erasing the previous at job for the happy birthday banner. Type:
at r 453400603.a
If you have forgotten the job number, the at l command will give you a list of the current jobs in the batch
or at queue, as the following screen shows:
$ at l
user = mylogin 168302040.a at Mon Nov 25 13:00:00 1991
user = mylogin 453400603.a at Sun Dec 08 08:15:00 1991
$
Notice that the system displays the job number and the time the job will run.
Using the at command, mail yourself the file memo at noon, to tell you it is lunch time. (You must redirect the
file into mail unless you use a ``here document,'' described in ``Shell programming''.) Then try the at
command with the l option.
14
Hours
Day_of_Month
Month
Day_of_week
Command
Fields are separated by spaces or tabs. The file cannot have blank lines. The cronfile parameters are as
follows:
Field
Minutes
Hours
Day of month
Month of year
Day of week
Command
Allowable values
059
023
131
112
06 (0=Sunday)
any noninteractive command
A field can be a number, a range of numbers (for example 1020), a list of numbers separated by commas, or
an asterisk (all values). For example, an asterisk in the ``Hours'' field means ``every hour''; an asterisk in the
``Month'' field means ``every month''.
Let us assume that you want to write a cronfile to issue reminders and perform regular tasks:
You need to attend a meeting at 10A.M. every Monday, and you want to remind yourself of this at
9:45A.M. on Monday mornings.
You want to find and remove any old files beginning with ``#'' in your home directory at 4:30P.M. on
the first day of every month.
You want to echo the date and time to your terminal at 9:00A.M. Monday to Friday.
Create a file that looks like the following:
45
30
0
9
16
9
*
1
*
*
*
*
1
*
15
When you have created your file (called cronfile), submit it by typing the following:
$ crontab cronfile
15
Notice that the system reports a PID number for the grep command, as well as for the other processes that are
running: the ps command itself, and the sh (shell) command that runs throughout the time you are logged in.
(The shell program sh interpretsthat is, passes on to the computershell commands.)
See the ps (C) manual page for all available options and an explanation of the capabilities of each.
You can suspend and restart programs if your login has been configured for job control. See your system
administrator to have your login set up to include job control. The jobs command also gives you a listing of
current background processes, running or stopped. However, in addition to the PID, the jobs command gives
you a number called the ``job identifier'' (JID) and the original command typed to initiate the job (job_name).
You need to know the JID of a process whenever you want to restart a stopped job or resume a background
process in foreground. The JID is printed on the screen when you enter a command to start or stop a process.
To obtain information about your stopped or background jobs, type:
jobs
16
or
stop %JID
Note that you cannot terminate background processes by pressing the <BREAK> or <DELETE> key. The
following example shows how you can terminate the grep command that you started executing in background
mode in the previous example.
$ kill 28223
[JID] + Terminatedjob_name
$
Notice that the system responds with a message and a ``$'' prompt, showing that the process has been killed. If
the system cannot find the PID number you specify, it responds with an error message:
kill:28223:No such process
To suspend a foreground process in the job shell (only when job control is active), type:
<CTRLz>
17
Notice that you place the nohup command before the command you intend to run as a background process.
For example, suppose you want the grep command to search all the files in your current directory for the
string word and redirect the output to a file called word.list, and you want to log out immediately afterward.
Type the command line as follows:
nohup grep word * > word.list &
You can terminate the nohup command by using the kill command. Now that you have mastered these basic
shell commands and notations, use them in your shell programs! The exercises in the following section will
help you practice using the shell command language. Answers to the exercises appear at the end of the
section.
echo *
13.
Is it acceptable to use a ``?'' at the beginning or in the middle of a pattern? Try it.
14.
Do you have any files that begin with a number? Can you list them without listing the other files in
your directory? Can you list only those files that begin with a lowercase letter between a and m?
(Hint: Use a range of numbers or letters in [ ]).
15.
Is it acceptable to place a command in background mode on a line that is executing several other
commands sequentially? Try it. What happens? (Hint: Use ``;'' and ``&.'') Can the command in
background mode be placed in any position on the command line? Try placing it in various positions.
Experiment with each new character that you learn to see the full power of the character.
Executing, stopping and restarting processes
18
Remember, if you want to redirect both commands to the same file, you have to use the ``>>''
(append) sign for the second redirection. If you do not, you will wipe out the information from the
pwd command.
17.
Instead of cutting the time out of the date response, try redirecting only the date, without the time,
into banner. What is the only part you need to change in the following command line?
banner `date | cut c1219`
Answers
11.
The * at the beginning of a filename refers to all files that end in that filename, including that
filename.
$ ls *t
cat
123t
new.t
t
$
12.
The command cat [09]* produces the following output:
1memo
100data
9
05name
The command echo * produces a list of all the files in the current directory.
13.
You can place ``?'' in any position in a filename.
14.
The command ls [09]* lists only those files that start with a number.
The command ls [am]* lists only those files that start with the letters ``a'' through ``m.''
15.
If you placed the sequential command line in the background mode, the immediate system response
was the PID number for the job.
Answers
19
17.
Change the c option of the command line to read:
banner `date | cut c110`
Shell programming
You can use the shell to create programsnew commands. Such programs are also called shell procedures.
This section tells you how to create and execute shell programs using commands, variables, positional
parameters, return codes, and basic programming control structures.
The examples of shell programs in this section are shown two ways. First, the cat command is used in a
screen to display the contents of a file containing a shell program:
$ cat testfile
first_command
.
.
.
last_command
$
Second, the results of executing the shell program appear after a command line:
$ testfile
program_output
$
You should be familiar with an editor before you try to create shell programs.
Shell programs
We will begin by creating a simple shell program that will do the following tasks, in order:
print the current directory
list the contents of that directory
display this message on your terminal:
This is the end of the shell program.
Create a file called dl (short for directory list) using your editor of choice, and enter the following:
pwd
ls
echo This is the end of the shell program.
Shell programming
20
The dl command is executed by sh, and the pathname of the current directory is printed first, then the list of
files in the current directory, and finally, the comment This is the end of the shell program.
The sh command provides a good way to test your shell program to make sure it works.
If dl is a useful command, you can use the chmod command to make it an executable file; then you can type
dl by itself to execute the command it contains. The following example shows how to use the chmod
command to make a file executable and then run the ls l command to verify the changes you have made in
the permissions.
$ chmod u+x dl
$ ls l
total 2
rw
1
rwx
1
$
login
login
login
login
3661
48
Nov 2
Nov 15
10:28 mbox
10:50 dl
Notice that chmod turns on permission to execute (+x) for the user (u). Now dl is an executable program. Try
to execute it. Type:
dl
You get the same results as before, when you entered sh dl to execute it.
Creating a bin directory for executable files
To make your shell programs accessible from all your directories, you can make a bin directory from your
login directory and move the shell files to your bin.
You must also set your shell variable PATH to include your bin directory:
PATH=$PATH:$HOME/bin
See ``Variables'' and ``Using shell variables'' for more information about PATH.
The following example reminds you which commands are necessary. In this example, dl is in the login
directory. Type these command lines:
cd
Shell programs
21
Move to the bin directory and type the ls l command. Does dl still have execute permission?
Now move to a directory other than the login directory, and type the following command:
dl
What happened?
It is possible to give the bin directory another name; if you do so, you must change your shell variable PATH
again.
Warnings about naming shell programs
You can give your shell program any appropriate filename; however, you should not give your program the
same name as a system command. Depending on your path, the system may execute your command instead of
the system command. For example, if you had named your dl program mv, each time you tried to move a file,
the system might have executed your directory list program instead of mv.
Another problem can occur if you name the dl file ls, and then try to execute the file. You would create an
infinite loop, since your program executes the ls command. After some time, the system would give you the
following error message:
Too many processes, cannot fork
What happened? You typed in your new command, ls. The shell read and executed the pwd command. Then
it read the ls command in your program and tried to execute your ls command. This formed an infinite loop.
For this reason, the SCO OpenServer system limits the number of times an infinite loop can execute. One way
to prevent such looping is to give the pathname for the system ls command, /usr/bin/ls, when you write your
own shell program.
The following ls shell program would work:
$ cat ls
pwd
/bin/ls
echo This is the end of the shell program
If you name your command ls, then you can only execute the system ls command by using its full pathname,
/usr/bin/ls.
Variables
Variables are the basic data objects that, in addition to files, shell programs manipulate. Here we discuss three
types of variables and how to use them:
positional parameters
special parameters
named variables
Shell programs
22
then positional parameter $1 within the program is assigned the value pp1, positional parameter $2 within the
program is assigned the value pp2, and so on, at the time the shell program is invoked.
To practice positional parameter substitution, create a file called pp (short for positional parameters).
(Remember, the directory in which these example files reside must be in $PATH.) Then enter the echo
commands shown in the following screen. Enter the command lines so that running the cat command on your
completed file will produce the following output:
$ cat
echo
echo
echo
echo
$
pp
The
The
The
The
If you execute this shell program with the arguments one, two, three, and four, you will obtain the following
results (but first you must make the shell program pp executable using the chmod command):
$ chmod u+x pp
$
$ pp one two three four
The first positional parameter is: one
The second positional parameter is: two
The third positional parameter is: three
The fourth positional parameter is: four
$
Another example of a shell program is bbday, which mails a greeting to the login entered in the command
line. The bbday program contains one line:
banner happy birthday | mail $1
Try sending yourself a birthday greeting. If your login name is sue, your command line will be:
bbday sue
The who command lists all users currently logged in on the system. How can you make a simple shell
program called whoson, that will tell you if the owner of a particular login is currently working on the
system?
Type the following command line into a file called whoson:
who | grep $1
Variables
23
Jan 24 13:35
If the owner of the specified login is not currently working on the system, grep fails and the whoson prints no
output.
The shell allows a command line to contain at least 128 arguments; however, a shell program is restricted to
referencing only nine positional parameters, $1 through $9, at a given time. You can work around this
restriction by using the shift command. See sh(C) for details. The special parameter ``$*'' (described in the
next section) can also be used to access the values of all command line arguments.
Special parameters
``$#''
This parameter, when referenced in a shell program, contains the number of arguments with which the
shell program was invoked. Its value can be used anywhere in the shell program.
Enter the command line, shown in the following screen, in the executable shell program called get.num. Then
run the cat command on the file:
$ cat get.num
echo The number of arguments is: $#
$
The program simply displays the number of arguments with which it is invoked. For example:
$ get.num test out this program
The number of arguments is: 4
$
You can write a simple shell program to demonstrate ``$*''. Create a shell program called show.param that
will echo all the parameters. Use the command line shown in the following completed file:
$ cat show.param
echo The parameters for this command are: $*
$
The program show.param will echo all the arguments you give the command. Make show.param executable
and try it using these parameters:
Variables
24
Notice that show.param echoes Hello. How are you? Now try show.param using more than nine
arguments:
$ show.param a b c d e f g h i j
The parameters for this command are: a b c d e f g h i j
$
Once again, show.param echoes all the arguments you give. The ``$*'' parameter can be useful if you use
filename expansion to specify arguments to the shell command.
Use the filename expansion feature with your show.param command. For example, suppose you have three
files in your directory named for the first three chapters of a book. The show.param command prints a list of
all those files.
$ show.param chap?
The parameters for this command are: chap1 chap2 chap3
$
Named variables
Another form of variable that you can use in a shell program is a named variable. You assign values to named
variables yourself. The format for assigning a value to a named variable is:
named_variable=value
Notice that there are no spaces on either side of the equals (=) sign.
In the following example, var1 is a named variable, and myname is the value or character string assigned to
that variable:
var1=myname
A ``$'' is used in front of a variable name in a shell program to reference the value of that variable. Using the
example above, the reference $var1 tells the shell to substitute the value myname (assigned to var1), for any
occurrence of the character string $var1.
The first character of a variable name must be a letter or an underscore. The rest of the name can consist of
letters, underscores, and digits. Like shell program filenames, variable names should not be shell command
names. Also, the shell reserves some variable names that you should not use for your variables. The following
list provides brief descriptions of some of the most important of these reserved shell variable names.
CDPATH defines the search path for the cd command.
HOME is the default variable for the cd command (home directory).
IFS defines the internal field separators (normally <<Space>>, <TAB>, and <Return>).
LOGNAME is your login name.
MAIL names the file that contains your electronic mail.
PATH determines the search path used by the shell to find commands.
Variables
25
The system outputs the value of variable_name. Second, you can use the env(C) command to print out the
value of all defined variables in the shell. To do this, type env on a line by itself; the system outputs a list of
the variable names and values.
Assigning a value to a variable
You can set the TERM variable by entering the following command line:
TERM=terminal_name
export TERM
This is the simplest way to assign a value to a variable. However, there are several other ways to do this:
Use the read command to assign input to the variable.
Redirect the output of a command into a variable by using command substitution with grave accents (`
. . . `).
Assign a positional parameter to the variable.
The following sections discuss each of these methods in detail.
Using the read command
The read command used within a shell program allows you to prompt the user of the program for the values
of variables. The general format for the read command is:
read variable
The values assigned by read to variable will be substituted for ``$''variable wherever it is used in the
program. If a program executes the echo command just before the read command, the program can display
directions such as Type in . . . . The read command will wait until you type a character string, followed by
<Return>, and then make that string the value of the variable.
The following example shows how to write a simple shell program called num.please to keep track of your
telephone numbers. This program uses the following commands for the purposes specified.
echo
Prompt you for a person's last name.
Variables
26
Create a file called list that contains several last names and telephone numbers. Then try running num.please.
The next example is a program called mknum, which creates a list. mknum includes the following
commands for the purposes shown.
echo prompts for a person's name.
read assigns the person's name to the variable name.
echo asks for the person's number.
read assigns the telephone number to the variable num.
echo adds the values of the variables name and num to the file list.
If you want the output of the echo command to be added to the end of list, you must use >> to redirect it. If
you use >, list will contain only the last telephone number you added.
Running the cat command on mknum displays the contents of the program. When your program looks like
this, you will be ready to make it executable (with the chmod command):
$ cat mknum
echo Type in name
read name
echo Type in number
read num
echo $name $num >> list
$ chmod u+x mknum
$
Try out the new programs for your telephone list. In the next example, mknum creates a new listing for Mr.
Niceguy. Then num.please gives you Mr. Niceguy's telephone number:
$ mknum
Type in name
Mr. Niceguy
Type in number
6680007
$ num.please
Type in last name
Niceguy
Mr. Niceguy 6680007
$
Variables
27
You can substitute the output of a command for the value of a variable by using command substitution in the
following format:
variable=`command`
You can put this in a simple shell program called t that gives you the time.
$ cat t
time=`date | cut c1216`
echo The time is: $time
$
Remember, there are no spaces on either side of the equal sign. Make the file executable, and you will have a
program that gives you the time:
$ chmod u+x t
$ t
The time is: 10:36
$
You can assign a positional parameter to a named parameter by using the following format:
var1=$1
The next example is a simple program called simp.p that assigns a positional parameter to a variable. By
running the cat command on simp.p, you can see the contents of this program:
$ cat simp.p
var1=$1
echo $var1
$
Of course, you can also assign to a variable the output of a command that uses positional parameters, as
follows:
person=`who | grep $1`
In the next example, the program log.time keeps track of your whoson program results. The output of whoson
is assigned to the variable person, and added to the file login.file with the echo command. The last echo
Variables
28
If you execute log.time specifying maryann as the argument, the system responds as follows:
$ log.time maryann
maryann
tty61
$
Apr 11 10:26
For example, a program that contains the following lines will ignore them when it is executed:
# This program sends a generic birthday greeting.
# This program needs a login as
# the positional parameter.
Comments are useful for documenting the function of a program and should be included in any program you
write.
Here documents
A here document allows you to place into a shell program lines that are redirected to be the input to a
command in that program. By using a here document, you can provide input to a command in a shell program
without using a separate file. The notation consists of the redirection symbol ``<<'' and a delimiter that
Shell programming constructs
29
When you use this command, you must specify the recipient's login as the argument to the command. The
input included with the use of the here document is:
Best wishes to you on your birthday.
For example, to send this greeting to the owner of login mary, type:
$ gbday mary
User mary will receive your greeting the next time she reads her mail messages:
$ mail
From mylogin Mon May 14 14:31 CDT 1991
Best wishes to you on your birthday.
$
30
The program uses three variables: file1, old_text, and new_text. When the program is run, it uses the read
command to obtain the values of these variables. The variables provide the following information:
file
the name of the file to be edited
old_text
the exact text to be changed
new_text
the new text
Once the variables are entered in the program, the here document redirects the global substitution, the write
command, and the quit command into the ed command. Try the new ch.text command. The following screen
shows sample responses to the program prompts:
$ ch.text
Type in the filename.
memo
Type in the exact text to be changed.
Dear John:
Type in the exact new text to replace the above.
To whom it may concern:
$ cat memo
To whom it may concern:
$
Notice that by running the cat command on the changed file, you could examine the results of the global
substitution.
The stream editor sed can also be used in shell programming.
Return codes
Most shell commands issue return codes that show whether the command executed properly. By convention,
if the value returned is 0 (zero), then the command executed properly; any other value shows that it did not.
The return code is not printed automatically, but is available as the value of the shell special parameter ``$?''.
Checking return codes
After executing a command interactively, you can see its return code by typing
echo $?
31
In the first case, the file hi exists in your directory and has read permission for you. The cat command behaves
as expected and outputs the contents of the file. It exits with a return code of 0, which you can see using the
parameter $?. In the second case, the file either does not exist or does not have read permission for you. The
cat command prints a diagnostic message and exits with a return code of 2.
Using return codes with the exit command
A shell program normally terminates when the last command in the file is executed. However, you can use
the exit command to terminate a program at some other point. Perhaps more importantly, you can also use the
exit command to issue return codes for a shell program.
Loop constructs: for and while
In the previous examples in this section, the commands in shell programs have been executed in sequence.
The for and while looping constructs allow a program to execute a command or sequence of commands
several times.
The for loop
The for loop executes a sequence of commands once for each member of a list. It has the format shown in
``Format of the for loop construct''.
for variable
in a_list_of_values
do
command_1
command_2
.
.
.
last_command
done
32
echo
Prompt the user for a pathname to the new directory.
read
Assign the pathname to the variable path.
for variable
Call the variable file; it can be referenced as $file in the command sequence.
in list_of_values
Supply a list of values. If the in clause is omitted, the list of values is assumed to be ``$*'' (all the
arguments entered on the command line).
do command_sequence
Provide a command sequence. The construct for this program will be:
do
mv $file $path/$file
done
The following screen shows the text for the shell program mv.file:
$ cat mv.file
echo Please type in the directory path
read path
for file
in memo1 memo2 memo3
do
mv $file $path/$file
done
$
In this program the values for the variable file are already in the program. To change the files each time the
program is invoked, assign the values using positional parameters or the read command. When positional
parameters are used, the in keyword is not needed, as the next screen shows:
$ cat mv.file
echo type in the directory path
read path
for file
do
mv $file $path/$file
done
$
33
Another loop construct, the while loop, uses two groups of commands. It will continue executing the
sequence of commands in the second group, the do . . . done list, as long as the final command in the first
group, the while list, returns a status of (true), meaning the statements after the do can be executed.
The general format of the while loop is shown in ``Format of the while loop construct''.
while
command_1
.
.
.
last_command
do
command_1
.
.
.
last_command
done
Notice that, after the loop is completed, the program executes the commands below the done.
You used special characters in the first two echo command lines, so you must use quotes to turn off the
special meaning. The next screen shows the results of enter.name:
$
enter.name
34
Mary Lou
Janice
<CTRLd>
xfile contains the following names:
Mary Lou
Janice
$
Notice that after the loop completes, the program prints all the names contained in xfile.
The shell's garbage can: /dev/null
The file system has a file called /dev/null where you can have the shell deposit any unwanted output.
Try /dev/null by ignoring the results of the who command. First, type in the who command. The response tells
you who is on the system. Now, try the who command, but redirect the output into /dev/null:
who > /dev/null
Notice the system responded with a prompt. The output from the who command was placed in /dev/null and
was effectively discarded.
Conditional constructs: if and case
Conditional constructs cause branches in the path of execution based on the outcome of a comparison.
if . . . then
The if command tells the shell program to execute the then sequence of commands only if the final command
in the if command list is successful. The if construct ends with the keyword fi.
The general format for the if construct is shown in ``Format of the if . . . then conditional construct''.
if
command_1
.
.
.
last_command
then
command_1
.
.
.
last_command
fi
35
Notice that the read command assigns values to two variables. The first characters you type, up to a space, are
assigned to word. The rest of the characters, including embedded spaces, are assigned to file.
A problem with this program is the unwanted display of output from the grep command. If you want to
dispose of the system response to the grep command in your program, use the file /dev/null, changing the if
command line to the following:
if grep $word $file > /dev/null
Now execute your search program. It should respond only with the message specified after the echo
command.
if . . . then . . . else
The if . . . then construction can also issue an alternate set of commands with else, when the if command
sequence is false. when the if command sequence is false. It has the general format shown in ``Format of the if
. . . then . . . else conditional construct''.
if
command_1
.
.
.
last_command
then
command_1
.
.
.
last_command
else
command_1
.
.
.
last_command
fi
36
The test command, which checks to see if certain conditions are true, is a useful command for conditional
constructs. If the condition is true, the loop will continue. If the condition is false, the loop will end and the
next command will be executed. Some of the useful options for the test command are:
test r file
test w file
test x file
test s file
test var1 eq var2
test var1 ne var2
You may want to create a shell program to move all the executable files in the current directory to your bin
directory. You can use the test x command to select the executable files. Review the example of the for
construct that occurs in the mv.file program, shown in the following screen:
$ cat mv.file
echo type in the directory path
read path
for file
do
mv $file $path/$file
done
$
Create a program called mv.ex that includes an if test x statement in the do . . . done loop to move
executable files only. Your program will be as follows:
$ cat mv.ex
echo type in the directory path
read path
for file
do
if test x $file
then
mv $file $path/$file
fi
done
$
The directory path is the path from the current directory to the bin directory. However, if you use the value for
the shell variable HOME, you will not need to type in the path each time. $HOME gives the path to the login
Shell programming constructs
37
Test the command, using all the files in the current directory, specified with the * special character as the
command argument. The command lines shown in the following example execute the command from the
current directory and then changes to bin and lists the files in that directory. All executable files should be
there.
$ mv.ex *
$ cd; cd bin; ls
list_of_executable_files
$
case . . . esac
The case . . . esac construction has a multiple choice format that allows you to choose one of several patterns
and then execute a list of commands for that pattern. The pattern statements must begin with the keyword in,
and a ``)'' must be placed after the last character of each pattern. The command sequence for each pattern is
ended with ``;;''. The case construction must be ended with esac (the letters of the word case reversed).
The general format for the case construction is shown in ``The case . . . esac conditional construct'':
case word
in
pattern_1)
command_line_1
. . .
last_command_line
;;
pattern_2)
command_line_1
. . .
last_command_line
;;
pattern_3)
command_line_1
. . .
last_command_line
;;
*)
command_1
. . .
last_command
;;
esac
38
In the following example, assume the terminal is a Teletype 4420, Teletype 5410, or Teletype 5420.
The set.term program first checks to see whether the value of term is 4420. If it is, the program makes T4 the
value of TERM, and terminates. If it the value of term is not 4420, the program checks for other possible
values: 5410 and 5420. It executes the commands under the first pattern it finds, and then goes to the first
command after the esac command.
The pattern *, meaning everything else, is included at the end of the terminal patterns. It warns that you do not
have a pattern for the terminal specified and it allows you to exit the case construct:
$ cat set.term
echo If you have a TTY 4420 type in 4420
echo If you have a TTY 5410 type in 5410
echo If you have a TTY 5420 type in 5420
read term
case $term
in
4420)
TERM=T4
;;
5410)
TERM=T5
;;
5420)
TERM=T7
;;
*)
echo not a correct terminal type
;;
esac
export TERM
echo end of program
$
39
The continue command causes the program to go immediately to the next iteration of a while or for loop
without executing the remaining commands in the loop.
Functions
Functions provide a convenient and efficient means of coding and executing simple programs (more complex
programs should remain as shell programs). Functions are similar to shell programs except for the fact that
they are stored in memory and therefore, execute faster than a program. Another exception is that functions
only operate in the current shell process.
Defining a function
There are two formats that can be used in defining a function:
Format 1
Shell programming constructs
40
In this format, name is the name of the function. The parentheses is an indication to the shell that a function is
being defined. The body of the function (which is delimited by the curly braces) contains the commands to be
executed. Each command is separated by a semicolon and a space. The last command ends with a
semicolon, and the curly braces are separated from the body of the function by a space.
Format 2
name ()
> {
> command
> command
> command
> }
In this format, name () is the same as in format 1. However, upon pressing the <RETRUN> key, a ``>''
prompt will replace your regular shell prompt. The body of the function is coded at this point, starting with the
left curly brace. After the last command has been entered, the body of the function is closed with a right curly
brace. It is not necessary to use semicolons in this format.
Just as the exit statement is used within shell programs, the return statement is provided for use within
functions. This statement will terminate the function, but not the shell program that called the function. The
format of the return statement is:
return n
where n is the return status of the function. If n is omitted, or if a return statement is not coded within the
function, then the return status is that of the last command executed within the function.
Once the function has been defined, you can display it by using the shell set statement (without arguments)
which displays all of your current environment variable settings. At the end of the variable list, any functions
you have defined will be displayed.
If you find it necessary to remove a function during a session, the unset command can be used.
The format is:
unset function
Functions
41
The next example searches for a file in the current directory. Notice that format 2 is used in this case. Also,
the return statement is used. A return status of 1 indicates that the search did not find the file in question (a
message is also displayed to that effect). A return status of 0 indicates that the file exists.
isthere ()
{
if [ ! f $1 ]
then
echo "$1 was not created"
return 1
fi
return 0
}
Debugging programs
At times you may need to debug a program to find and correct errors. Two options to the sh command can
help you debug a program:
sh v shell_program_name
Print the shell input lines as they are read by the system.
sh x shell_program_name
Print commands and their arguments as they are executed.
To try these two options, create a shell program that has an error in it. For example, create a file called bug
that contains the following list of commands:
$ cat bug
today=`date`
echo enter person
read person
mail $1
$person
When you log off come into my office please.
$today.
MLH
$
Notice that today equals the output of the date command, which must be enclosed in grave accents for
command substitution to occur.
The mail message sent to Tom at login tommy ($1) should look like the following screen:
$ mail
From mlh
Tom
Functions
Wed Apr 10
11:36
CST
1991
42
To execute bug, you have to press the <BREAK> or <DELETE> key to end the program.
To debug this program, try executing bug using sh v. This will print the lines of the file as they are read by
the system, as shown below:
$ sh v bug tommy
today=`date`
echo enter person
enter person
read person
tom
mail $1
Notice the output stops on the mail command, since there is a problem with mail. You must use the here
document to redirect input into mail.
Before you fix the bug program, try executing it with sh x, which prints the commands and their arguments
as they are read by the system.
$ sh x bug tommy
+date
today=Wed Apr 10 11:07:23
+ echo enter person
enter person
+ read person
tom
+ mail tommy
$
CST
1991
Once again, the program stops at the mail command. Notice that the substitutions for the variables have been
made and are displayed.
The corrected bug program is as follows:
$ cat bug
today=`date`
echo enter person
read person
mail $1 <<!
$person
When you log off come into my office please.
$today
MLH
!
$
The tee command is helpful for debugging pipelines. While simply passing its standard input to its standard
output, it also saves a copy of its input into the file whose name is given as an argument.
The general format of the tee command is:
Functions
43
saverfile is the file in which the output of command_1 is saved for you to study.
For example, suppose you want to check on the output of the grep command in the following command line:
who | grep $1 | cut c19
You can use tee to copy the output of grep into a file called check, without disturbing the rest of the pipeline.
who | grep $1 | tee check | cut c19
If you can edit the file yourself, you may want to be cautious the first few times. Before making any changes
to your .profile, make a copy of it in another file called safe.profile. Type:
cp .profile safe.profile
You can add commands to your .profile just as you add commands to any other shell program. You can also
set some terminal options with the stty command, and set some shell variables.
44
stty tabs
This option preserves tabs when you are printing. It expands the tab setting to eight spaces, which is
the default. The number of spaces for each tab can be changed. (See stty(C) for details.)
stty echoe
If you have a terminal with a screen, this option erases characters from the screen as you erase them
with the <BACKSPACE> key.
If you want to use these options for the stty command, you can create those command lines in your .profile
just as you would create them in a shell program. If you use the tail command, which displays the last few
lines of a file, you can see the results of adding those three command lines to your .profile:
$ tail 3 .profile
echo Good Morning! I am ready to work for you
stty tabs
stty echoe
$
HOME
This variable gives the pathname of your login directory. Use the cd command to go to your login
directory and type:
pwd
45
Ask your system administrator which languages are available on your computer, and what values you
must assign to LANG to access them. Not all system commands support nonEnglish usage. Check
intro(C) for the ones that do. For details of LANG usage, see environ(5).
PATH
This variable gives the search path for finding and executing commands. To see the current values for
your PATH variable type:
echo $PATH
The colon ( ``:'' ) is a delimiter between pathnames in the string assigned to the $PATH variable.
When nothing is specified before a ``:'', the current directory is understood. Notice how, in the last
example, the system looks for commands in the current directory first, then in /mylogin/bin, then in
/bin, and finally in /usr/bin.
If you are working on a project with several other people, you may want to set up a group bin, a
directory of special shell programs used only by your project members. The path might be named
/project1/bin. Edit your .profile, and add :/project1/bin to the end of your PATH, as in the next
example.
PATH="$PATH:/project1/bin"
TERM
This variable tells the shell what kind of terminal you are using. To assign a value to it, you must
execute the following three commands in this order:
Setting terminal options
46
The first two lines, together, are necessary to tell the computer what type of terminal you are using.
The last line, containing the tput command, tells the terminal that the computer is expecting to
communicate with the type of terminal specified in the TERM variable. Therefore, this command
must always be entered after the variable has been exported.
If you do not want to specify the TERM variable each time you log in, add these three command lines
to your .profile; they will be executed automatically whenever you log in.
If you log in on more than one type of terminal, it would also be useful to have your set.term
command in your .profile.
PS1
This variable sets the primary shell prompt string (the default is the ``$'' sign). You can change your
prompt by changing the PS1 variable in your .profile.
Try the following example. Note that to use a multiword prompt, you must enclose the phrase in
quotes. Type the following variable assignment in your .profile.
PS1="Your command is my wish"
Now execute your .profile (with the . command) and watch for your new prompt sign.
$ . .profile
Your command is my wish
The ``$'' sign is gone forever, or at least until you delete the PS1 variable from your .profile.
22.
Write a shell program that gives only the date in a banner display. Be careful not to give your program
the same name as a SCO OpenServer system command.
23.
Write a shell program that sends a note to several people on your system.
24.
Redirect the date command without the time into a file.
25.
Shell programming exercises
47
Answers
21.
$ cat time
banner `date | cut c1219`
$
$ chmod u+x time
22.
$ cat mydate
banner `date | cut c110`
$
23.
Answers
48
Or, if you used parameters for the logins (instead of the logins themselves) your program may have
looked like this:
$ cat tofriends
echo Type in the name of the file containing the note.
read note
mail $* < $note
$
24.
$ cat send.memo
date | cut c110 > memo1
echo Dear colleague >> memo1
cat memo >> memo1
echo A memo from M. L. Kelly >> memo1
mail janice marylou bryan < memo1
$
27.
$ cat mv.file
echo type in the directory path
read path
echo type in filenames, end with CTRLd
while
read file
do
mv $file $path/$file
done
echo all done
$
28.
Answers
49
$ cat mv.file
echo Please type in directory path
read path
for file in $*
do
mv $file $path/$file
done
$
The command line for moving all files in the current directory is:
$ mv.file *
29.
See the hint provided with exercise 29.
$ cat search
for file
in $*
do
if grep $word $file >/dev/null
then echo $word is in $file
else echo $word is NOT in $file
fi
done
$
210.
Add the following lines to your .profile:
stty tabs
stty erase
stty echoe
211.
Add the following command lines to your .profile:
PS1=Hello
export PS1
212.
Enter the following commands to check the values of the HOME, TERM, and PATH variables in
your home environment:
$ echo $HOME
$ echo $TERM
$ echo $PATH
Answers
50
*?[]
Metacharacters; used to provide a shortcut to referencing filenames, through pattern matching.
&
Executes commands in the background mode.
;
Sequentially executes several commands typed on one line, each pair separated by ;.
\
Turns off the meaning of the immediately following special character.
...
Enclosing single quotes turn off the special meaning of all characters except single quotes.
...
Enclosing double quotes turn off the special meaning of all characters except $, single quotes, and
double quotes.
<
Redirects the contents of a file into a command.
>
Redirects the output of a command into a new file, or replaces the contents of an existing file with the
output.
>>
Redirects the output of a command so that it is appended to the end of a file.
|
Directs the output of one command so that it becomes the input of the next command.
51
batch
Submits the following commands to be processed at a time when the system load is at an acceptable
level. <CTRLd> ends the batch command.
at
Submits the following commands to be executed at a specified time. <CTRLd> ends the at
command.
at l
Reports which jobs are currently in the at or batch queue.
at r
Removes the at or batch job from the queue.
ps
Reports the status of the shell processes.
kill PID
Terminates the shell process with the specified process ID (PID).
nohup command list &
Continues background processes after logging out.
52
positional parameter
A numbered variable used within a shell program to reference values automatically assigned by the
shell from the arguments of the command line invoking the shell program.
echo
A command used to print the value of a variable on your terminal.
``$#''
A special parameter that contains the number of arguments with which the shell program has been
executed.
``$*''
A special parameter that contains the values of all arguments with which the shell program has been
executed.
named variable
A variable to which the user can give a name and assign values.
HOME
Denotes your home directory; the default variable for the cd command.
PATH
Defines the path your login shell follows to find commands.
MAIL
Gives the name of the file containing your electronic mail.
PS1, PS2
Defines the primary and secondary prompt strings, respectively.
TERM
Defines the type of terminal.
LOGNAME
Login name of the user.
IFS
Defines the internal field separators (normally the space, the tab, and the carriage return).
TERMINFO
The vocabulary of shell command language
53
For loop
for variable
in
this list of values
do
command 1
command 2
.
.
.
last command
done
While loop
while command list
do
command 1
command 2
.
.
.
last command
done
If...then
if
this command list is successful
then
command 1
command 2
.
.
.
last command
fi
54
Case construction
case word in
pattern1)
command 1
. . .
last command
;;
pattern2)
command 1
. . .
last command
;;
.
.
.
last pattern2)
command 1
. . .
last command
;;
esac
55
Basic awk
This section provides enough information for you to write and run some of your own programs. Each topic
presented in this section is discussed in more detail in later sections.
Program structure
The basic operation of awk is to scan a set of input lines one after another, searching for lines that match any
of a set of patterns or conditions you specify. For each pattern, you can specify an action; this action is
performed on each line that matches the pattern. Accordingly, an awk program is a sequence of
patternaction statements, as ``awk program structure and example'' shows.
Structure:
56
the action is performed for every input line. Since patterns and actions are both optional, actions are enclosed
in braces to distinguish them from patterns.
Usage
You can run an awk program two ways. First, you can enter the command
$ awk 'patternaction statements' optional list of input files<<Return>>
to execute the patternaction statements on the set of named input files. For example, you could say
$ awk '{ print $1, $2 }' file1 file2<<Return>>
Notice that the patternaction statements are enclosed in single quotes. This protects characters like $ from
being interpreted by the shell and also allows the program to be longer than one line.
If no files are mentioned on the command line, awk reads from the standard input. You can also specify that
input comes from the standard input by using the hyphen () as one of the input files. For example,
$ awk '{ print $3, $4 }' file1 <<Return>>
says to read input first from file1 and then from the standard input.
The arrangement above is convenient when the awk program is short (a few lines). If the program is long, it is
often more convenient to put it into a separate file and use the f option to fetch it:
$ awk f program_file optional list of input files<<Return>>
For example, the following command line says to fetch and execute myprogram on input from the file file1:
$ awk f myprogram file1<<Return>>
Fields
Normally, awk reads its input one line, or record, at a time; a record is, by default, a sequence of characters
ending with a newline. Then awk splits each record into fields, where, by default, a field is a string of
nonblank, nontab characters.
Usage
57
8650
3852
3692
3615
3286
2968
1269
1072
968
920
262
24
866
219
116
14
637
26
19
18
Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa
This file is typical of the kind of data awk is good at processinga mixture of words and numbers separated
into fields by blanks and tabs.
The number of fields in a record is determined by the field separator. Fields are normally separated by
sequences of blanks and/or tabs, so that the first record of countries would have four fields, the second five,
and so on. It is possible to set the field separator to just tab, so each line would have four fields, matching the
meaning of the data; we will show how to do this shortly. For the time being, we will use the default: fields
separated by blanks and/or tabs. The first field within a line is called $1, the second $2, and so forth. The
entire record is called $0.
Printing
If the pattern in a patternaction statement is omitted, the action is executed for all input lines. The simplest
action is to print each line; you can accomplish this with an awk program consisting of a single print
statement
{ print }
prints each line of countries, copying the file to the standard output. The print statement can also be used to
print parts of a record; for instance, the program
{ print $1, $3 }
58
When printed, items separated by a comma in the print statement are separated by the output field separator
which, by default, is a single blank. Each line printed is terminated by the output record separator which, by
default, is a newline.
NOTE: In the remainder of this topic, we only show awk programs, without the command line that invokes
them. Each complete program can be run either by enclosing it in quotes as the first argument of the awk
command, or by putting it in a file and invoking awk with the f flag, as discussed in ``Usage''. For example,
if no input is mentioned, the input is assumed to be the file countries.
Formatted printing
For more carefully formatted output, awk provides a Clike printf statement
printf format, expr[1], expr[2], . . ., expr[n]
which prints the expr[i]'s according to the specification in the string format. For example, the awk program
{ printf "%10s %6d\n", $1, $3 }
prints the first field (``$1'') as a string of 10 characters (rightjustified), then a space, then the third field
(``$3'') as a decimal number in a sixcharacter field, then a newline (\n). With input from the file countries,
this program prints an aligned table:
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria
262
24
866
219
116
14
637
26
19
18
With printf, no output separators or newlines are produced automatically; you must create them yourself by
using \n in the format specification. ``The printf statement'' contains a full description of printf.
Formatted printing
59
Simple patterns
You can select specific records for printing or other processing by using simple patterns. awk has three kinds
of patterns. First, you can use patterns called relational expressions that make comparisons. For example, the
operator == tests for equality. To print the lines for which the fourth field equals the string Asia, we can use
the program consisting of the single pattern
$4 == "Asia"
262
3692
1269
Asia
866
637
Asia
Asia
The complete set of comparisons is >, >=, <, <=, == (equal to) and != (not equal to). These comparisons can
be used to test both numbers and strings. For example, suppose we want to print only countries with a
population greater than 100 million. The program
$3 > 100
is all that is needed. It prints all lines in which the third field exceeds 100. (Remember that the third field in
the file countries is the population in millions.)
Second, you can use patterns called extended regular expressions that search for specified characters to select
records. The simplest form of an extended regular expression is a string of characters enclosed in slashes:
/US/
This program prints each line that contains the (adjacent) letters US anywhere; with the file countries as input,
it prints
USSR 8650
USA 3615
262
219
Asia
North America
We will have a lot more to say about extended regular expressions later in this topic.
Third, you can use two special patterns, BEGIN and END, that match before the first record has been read
and after the last record has been processed. This program uses BEGIN to print a title:
BEGIN
/Asia/
The output is
Countries of Asia:
USSR
China
India
Simple patterns
60
Simple actions
We have already seen the simplest action of an awk program: printing each input line. Now let us consider
how you can use builtin and userdefined variables and functions for other simple actions in a program.
Builtin variables
Besides reading the input and splitting it into fields, awk counts the number of records read and the number of
fields within the current record; you can use these counts in your awk programs. The variable NR is the
number of the current record, and NF is the number of fields in the record. So the program
{ print NR, NF }
prints the number of each line and how many fields it has, while
{ print NR, $0 }
The first action accumulates the population from the third field; the second action, which is executed after the
last input, prints the sum and average:
Total population is 2201 million
Average population of 10 countries is 220.1
Functions
Builtin functions of awk handle common arithmetic and string operations for you. For example, one of the
arithmetic functions computes square roots; a string function substitutes one string for another. awk also lets
you define your own functions. Functions are described in detail in ``Actions''.
61
{ line = $0}
{ print line }
{ print NR }
{ nf = nf + NF }
{ print nf }
{ nc = nc + length($0) }
{ print nc + NR }
Simple actions
62
Error messages
If you make an error in your awk program, you generally get an error message. For example, trying to run the
program
$3 < 200 { print ( $1 }
Patterns
In a patternaction statement, the pattern is an expression that selects the records for which the associated
action is executed. This section describes the kinds of expressions that may be used as patterns.
Error messages
AREA
8650
3852
3692
3615
3286
POP
262
24
866
219
116
CONTINENT
Asia
North America
Asia
North America
South America
63
2968
1269
1072
968
920
14
637
26
19
18
Australia
Asia
South America
Africa
Africa
Relational expressions
An awk pattern can be any expression involving comparisons between strings of characters or numbers. awk
has six relational operators, and two extended regular expression matching operators, ~ (tilde) and !~, which
are discussed in the next section, for making comparisons. ``awk comparison operators'' lists these operators
and their meanings.
awk comparison operators
Operator
<
<=
==
!=
>=
>
~
!~
Meaning
less than
less than or equal to
equal to
not equal to
greater than or equal to
greater than
matches
does not match
In a comparison, if both operands are numeric, a numeric comparison is made; otherwise, the operands are
compared as strings. (Every value might be either a number or a string; usually awk can tell what is intended.
``Number or string?'' contains more information about this.) Thus, the pattern $3>100 selects lines where the
third field exceeds 100, and the program
$1 >= "S"
262
219
968
Asia
North America
19
Africa
In the absence of any other information, awk treats fields as strings, so the program
$1 == $4
compares the first and fourth fields as strings of characters, and with the file countries as input, prints the
single line for which this test succeeds:
Australia
2968
14
Australia
64
This program prints all input records that contain the substring Asia. (If a record contains Asia as part of a
larger string like Asian or PanAsiatic, it is also printed.) In general, if re is an extended regular expression,
then the pattern
/re/
matches any line that contains a substring specified by the extended regular expression re.
To restrict a match to a specific field, you use the matching operators ~ (matches) and !~ (does not match).
The program
$4 ~ /Asia/ { print $1 }
prints the first field of all lines in which the fourth field matches Asia, while the program
$4 !~ /Asia/ { print $1 }
prints the first field of all lines in which the fourth field does not match Asia.
In extended regular expressions, the symbols
\ ^ $ . []
+ ? () | {}
are metacharacters with special meanings like the metacharacters in the SCO OpenServer shell. For example,
the metacharacters ^ and $ match the beginning and end, respectively, of a string, and the metacharacter .
(dot) matches any single character. Thus,
/^.$/
prints all records in which the second field is not a string of one or more digits (^ for beginning of string,
[09]+ for one or more digits, and $ for end of string). Programs of this type are often used for data
validation.
65
matches lines containing any one of the four substrings apple pie,
apple tart, cherry pie, or cherry tart.
Extended regular expressions provide a more general form of repetition via the ``interval'' operator. This
operator is of the form {low,high}, with the high limit optional. The three operators ?, * and + are equivalent,
respectively, to the interval constructs [0,1}, {0,} and {1}. To denote an exact number of matches, use the form
{count}.
To turn off the special meaning of a metacharacter, precede it by a \ (backslash). Thus, the program
/b\$/
backspace
formfeed
newline
carriage return
tab
octal value ddd
quotation mark
any other character c literally
hexadecimal value hhh
For example, to print all lines containing a tab, use the program
/\t/
awk interprets any string or variable on the right side of a ~ or !~ as an extended regular expression. For
example, we could have written the program
$2 !~ /^[09]+$/
as
BEGIN
{ digits = "^[09]+$" }
$2 !~ digits
66
~
~
~
~
"b\\$"
"b\$"
"b$"
"\\t"
x
x
x
x
~
~
~
~
/b\$/
/b$/
/b$/
/\t/
The precise form of extended regular expressions and the substrings they match is in ``awk extended regular
expressions''. The unary operators , +, ? and intervals have the highest precedence, then concatenation, and
then alternation |. All operators are left associative. r stands for any extended regular expression.
awk extended regular expressions
Expression
c
\c
^
$
.
[s]
[^s]
r
r+
r?
r{low,high}
(r)
r[1] r[2]
r[1]|r[2]
Matches
any nonmetacharacter c
character c literally
beginning of string
end of string
any character but newline
any character in set s
any character not in set s
zero or more r's
one or more r's
zero or one r
at least low rs but not more than high
r
r[1] then r[2] (concatenation)
r[1] or r[2] (alternation)
Combinations of patterns
A compound pattern combines simpler patterns with parentheses and the logical operators || (or), && (and),
and ! (not). For example, suppose you want to print all countries in Asia with a population of more than 500
million. The following program does this by selecting all lines in which the fourth field is Asia and the third
field exceeds 500:
$4 == "Asia" && $3 > 500
Combinations of patterns
67
selects lines with Asia or Africa as the fourth field. Another way to write the latter query is to use an extended
regular expression with the alternation operator | :
$4 ~ /^(Asia|Africa)$/
The negation operator ! has the highest precedence, then &&, and finally ||. The operators && and || evaluate
their operands from left to right; evaluation stops as soon as truth or falsehood is determined.
Pattern ranges
A pattern range consists of two patterns separated by a comma, as in
pat[1], pat[2]
{ . . . }
In this case, the action is performed for each line between an occurrence of pat[1] and the next occurrence of
pat[2] (inclusive). For example, the pattern
/Canada/, /Brazil/
matches lines starting with the first line that contains the string Canada up through the next occurrence of the
string Brazil:
Canada 3852 24 North America
China 3692 866 Asia
USA 3615 219 North America
Brazil 3286 116 South America
Similarly, since FNR is the number of the current record in the current input file (and FILENAME is the name
of the current input file), the program
FNR == 1, FNR == 5 { print FILENAME, $0 }
prints the first five records of each input file with the name of the current input file prepended.
Actions
In a patternaction statement, the action determines what is to be done with the input records that the pattern
selects. Actions frequently are simple printing or assignment statements, but they may also be a combination
of one or more statements. This section describes the statements that can make up actions.
Builtin variables
``awk builtin variables'' lists the builtin variables that awk maintains. You have already learned some of
these; others appear in this and later sections.
awk builtin variables
Pattern ranges
68
Meaning
number of commandline arguments
array of commandline arguments
name of current input file
record number in current file
input field separator
number of fields in current record
number of records read so far
output format for numbers
output field separator
output record separator
input record separator
index of first character matched by match
length of string matched by match
subscript separator
Default
blank&tab
%.6g
blank
newline
newline
"\034"
Arithmetic
Actions can use conventional arithmetic expressions to compute numeric values. As a simple example,
suppose you want to print the population density for each country in the file countries. Since the second field
is the area in thousands of square miles and the third field is the population in millions, the expression 1000
$3 / $2 gives the population density in people per square mile. The program
{ printf "%10s %6.1f\n", $1, 1000 * $3 / $2 }
when applied to the file countries, prints the name of each country and its population density:
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria
30.3
6.2
234.6
60.6
35.3
4.7
502.0
24.3
19.6
19.6
Arithmetic is done internally in floating point. The arithmetic operators are +, , , /, % (remainder) and ^
(exponentiation; is a synonym). Arithmetic expressions can be created by applying these operators to
constants, variables, field names, array elements, functions, and other expressions, all of which are discussed
later. Note that awk recognizes and produces scientific (exponential) notation: 1e6, 1E6, 10e5, and 1000000
are numerically equal.
awk has assignment statements like those found in the C programming language. The simplest form is the
assignment statement
v = e
Arithmetic
69
The action associated with the pattern $4 == "Asia" contains two assignment statements, one to accumulate
population and the other to count countries. The variables are not explicitly initialized, yet everything works
properly because awk initializes each variable with the string value "" and the numeric value 0.
The assignments in the previous program can be written more concisely using the operators += and ++:
$4 == "Asia" { pop += $3; ++n }
The operator += is borrowed from the C programming language; therefore,
pop += $3
but the += operator is shorter and runs faster. The same is true of the ++ operator, which adds one to a
variable.
The abbreviated assignment operators are +=, =, =, /=, %=, and ^=. Their meanings are similar:
v op= e
has the same effect as
v = v op e.
The increment operators are ++ and . As in C, they may be used as prefix (++x) or postfix (x++) operators.
If x is 1, then i=++x increments x, then sets i to 2, while i=x++ sets i to 1, then increments x. An analogous
interpretation applies to prefix and postfix .
Assignment and increment and decrement operators may all be used in arithmetic expressions.
We use default initialization to advantage in the following program, which finds the country with the largest
population:
maxpop < $3
END
Note, however, that this program would not be correct if all values of $3 were negative.
Arithmetic
70
Value returned
arctangent of y/x in the range to
cosine of x, with x in radians
exponential function of x
integer part of x truncated towards 0
natural logarithm of x
random number between 0 and 1
sine of x, with x in radians
square root of x
x is new seed for rand
x and y are arbitrary expressions. The function rand returns a pseudorandom floating point number in the
range (0,1), and srand(x) can be used to set the seed of the generator. If srand has no argument, the seed is
derived from the time of day.
prints each record preceded by its record number and a colon, with no blanks. The three strings representing
the record number, the colon, and the record are concatenated and the resulting string is printed. The
concatenation operator has no explicit representation other than juxtaposition.
awk provides the builtin string functions shown in ``awk builtin string functions''. In this table, r represents
an extended regular expression (either as a string or as /r/), s and t string expressions, and n and p integers.
awk builtin string functions
Function
gsub(r,s)
gsub(r,s,t)
index(s,t)
length(s)
match(s,r)
split(s,a)
Strings and string functions
Description
substitute s for r globally in current record, return number of
substitutions
substitute s for r globally in string t, return number of substitutions
return position of string t in s, 0 if not present
return length of s
return the position in s where r occurs, 0 if not present
split s into array a on FS, return number of fields
71
The functions sub and gsub are patterned after the substitute command in the text editor ed(C). The function
gsub(r,s,t) replaces successive occurrences of substrings matched by the extended regular expression r with
the replacement string s in the target string t. (As in ed, the leftmost match is used, and is made as long as
possible.) It returns the number of substitutions made. The function gsub(r,s) is a synonym for gsub(r,s,,$0).
For example, the program
{ gsub(/USA/, "United States"); print }
transcribes its input, replacing occurrences of USA by United States. The sub functions are similar, except
that they only replace the first matching substring in the target string.
The function index(s,t) returns the leftmost position where the string t begins
in s, or zero if t does not occur in s. The first character in a string is at position 1. For example,
index("banana", "an")
returns 2.
The length function returns the number of characters in its argument string; thus,
{ print length($0), $0 }
prints each record, preceded by its length. ($0 does not include the input record separator.) The program
length($1) > max
END
when applied to the file countries, prints the longest country name:
Australia.
The match(s,r) function returns the position in string s where extended regular expression r occurs, or 0 if it
does not occur. This function also sets two builtin variables RSTART and RLENGTH. RSTART is set to
the starting position of the match in the string; this is the same value as the returned value. RLENGTH is set
to the length of the matched string. (If a match does not occur, RSTART is 0, and RLENGTH is 1.) For
example, the following program finds the first occurrence of the letter i followed by at most one character
followed by the letter a in a record:
{ if (match($0, /i.?a/))
print RSTART, RLENGTH, $0 }
72
2
3
3
3
3
2
2
3
3
2
USSR
Canada
China
USA
Brazil
Australia
India
Argentina
Sudan
Algeria
8650
3852
3692
3615
3286
2968
1269
1072
968
920
262
24
866
219
116
14
637
26
19
18
Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa
NOTE: match matches the leftmost longest matching string. For example, with the record
AsiaaaAsiaaaaan
as input, the program
{ if (match($0, /a+/)) print RSTART, RLENGTH, $0 }
matches the first string of a's and sets RSTART to 4 and RLENGTH to 3.
The function sprintf(format, expr[1], expr[2], . . ., expr[n]) returns (without printing) a string containing
expr[1], expr[2], . . ., expr[n] formatted according to the printf specifications in the string format. ``The printf
statement'' contains a complete specification of the format conventions. The statement
x = sprintf("%10s %6d", $1, $2)
assigns to x the string produced by formatting the values of $1 and $2 as a tencharacter string and a decimal
number in a field of width at least six; x may be used in any subsequent computation.
The function substr(s,p,n) returns the substring of s that begins at position p and is at most n characters long.
If substr(s,p) is used, the substring goes to the end of s; that is, it consists of the suffix of s beginning at
position p. For example, we could abbreviate the country names in countries to their first three characters by
invoking the program
{ $1 = substr($1, 1, 3); print }
8650
3852
3692
3615
3286
2968
1269
1072
968
920
262
24
866
219
116
14
637
26
19
18
Asia
North America
Asia
North America
South America
Australia
Asia
South America
Africa
Africa
Note that setting $1 in the program forces awk to recompute $0 and, therefore, the fields are separated by
Strings and string functions
73
prints
USS Can Chi USA Bra Aus Ind Arg Sud Alg
Field variables
The fields of the current record can be referred to by the field variables $1, $2, . . ., $NF. Field variables share
all of the properties of other variablesthey may be used in arithmetic or string operations, and they may
have values assigned to them. So, for example, you can divide the second field of the file countries by 1000 to
convert the area from thousands to millions of square miles:
{ $2 /= 1000; print }
{
{
{
{
FS = OFS = "\t" }
$4 = "NA" }
$4 = "SA" }
print }
The BEGIN action in this program resets the input field separator FS and the output field separator OFS to a
tab. Notice that the print in the fourth line of the program prints the value of $0 after it has been modified by
previous assignments.
Fields can be accessed by expressions. For example, $(NF1) is the second to last field of the current record.
The parentheses are needed: the value of $NF1 is 1 less than the value in the last field.
A field variable referring to a nonexistent field, for example, $(NF+1), has as its initial value the empty string.
A new field can be created, however, by assigning a value to it. For example, the following program invoked
on the file countries creates a fifth field giving the population density:
BEGIN
{ FS = OFS = "\t" }
{ $5 = 1000 * $3 / $2; print }
The number of fields can vary from record to record, but usually the implementation limit is 100 fields per
record.
Number or string?
Variables, fields and expressions can have both a numeric value and a string value. They take on numeric or
string values according to context. For example, in the context of an arithmetic expression like
Field variables
74
pop and $3 must be treated numerically, so their values will be coerced to numeric type if necessary.
In a string context like
print $1 ":" $2
the type of the comparison depends on whether the fields are numeric or string, and this can only be
determined when the program runs; it may well differ from record to record.
In comparisons, if both operands are numeric, the comparison is numeric; otherwise, operands are coerced to
strings, and the comparison is made on the string values. All field variables are of type string; in addition,
each field that contains only a number is also considered numeric. This determination is done at run time. For
example, the comparison ``"$1 == $2"'' will succeed on any pair of the inputs
1
1.0
+1
0.1e+1
10E1
001
.01E2
0.0
There are two idioms for coercing an expression of one type to the other:
number
concatenate a null string to a number to coerce it to type string
string + 0
add zero to a string to coerce it to type numeric
Thus, to force a string comparison between two fields, use
$1 "" == $2 ""
The numeric value of a string is the value of any prefix of the string that looks numeric; thus the value of
12.34x is 12.34, while the value of x12.34 is zero. The string value of an arithmetic expression is computed by
formatting the string with the output format conversion OFMT.
Uninitialized variables have numeric value 0 and string value "". Nonexistent fields and fields that are
explicitly null have only the string value ""; they are not numeric.
Field variables
75
}
END
The expression is evaluated; if it is nonzero and nonnull the statement is executed and the expression is
tested again. The cycle repeats as long as the expression is nonzero. For example, to print all input fields one
per line,
{
i = 1
while (i <= NF) {
print $i
i++
}
76
so
{ for (i = 1; i <= NF; i++)
print $i }
does the same job as the while example shown above. An alternate version of the for statement is described in
the next section.
The do statement has the form
do statement while (expression)
The statement is executed repeatedly until the value of the expression becomes zero. Because the test takes
place after the execution of the statement (at the bottom of the loop), it is always executed at least once. As a
result, the do statement is used much less often than while or for, which test for completion at the top of the
loop.
The following example of a do statement prints all lines except those between start and stop.
/start/ {
do {
getline x
} while (x !~ /stop/)
}
{ print }
The break statement causes an immediate exit from an enclosing while or for; the continue statement causes
the next iteration to begin. The next statement causes awk to skip immediately to the next record and begin
matching patterns starting from the first patternaction statement.
The exit statement causes the program to behave as if the end of the input had occurred; no more input is read,
and the END action, if any, is executed. Within the END action,
exit expr
causes the program to return the value of expr as its exit status. If there is no expr, the exit status is zero.
Arrays
awk provides onedimensional arrays. Arrays and array elements need not be declared; like variables, they
spring into existence by being mentioned. An array subscript may be a number or a string.
As an example of a conventional numeric subscript, the statement
x[NR] = $0
assigns the current input line to the NRth element of the array x . In fact, it is
possible in principle (though perhaps slow) to read the entire input into an array with the awk program
END
Arrays
{ x[NR] = $0 }
{ . . . processing . . . }
77
{ pop["Asia"] += $3 }
{ pop["Africa"] += $3 }
"Asian population in millions is", pop["Asia"]
"African population in millions is", pop["Africa"]
uses the string in the fourth field of the current input record to index the array area and, in that entry,
accumulates the value of the second field:
BEGIN { FS = "\t" }
{ area[$4] += $2 }
END
{ for (name in area)
print name, area[name] }
This program uses a form of the for statement that iterates over all defined subscripts of an array:
for (i in array) statement
executes statement with the variable i set in turn to each value of i for which array[i] has been defined. The
loop is executed once for each defined subscript, which is chosen in a random order. Results are unpredictable
when i or array is altered during the loop.
awk does not provide multidimensional arrays, but it does permit a list of subscripts. They are combined into
a single subscript with the values separated by an unlikely string (stored in the variable SUBSEP). For
example,
Arrays
78
creates an array which behaves like a twodimensional array; the subscript is the concatenation of i,
SUBSEP, and j.
You can determine whether a particular subscript i occurs in an array arr by testing the condition i in arr, as in
if ("Africa" in area) ...
This condition performs the test without the side effect of creating area["Africa"], which would happen if
you used
if (area["Africa"] != "") ...
Note that neither is a test of whether the array area contains an element with the value "Africa" .
It is also possible to split any string into fields in the elements of an array using the builtin function split.
The function
split("s1:s2:s3", a, ":")
splits the string s1:s2:s3 into three fields, using the separator :, and stores s1 in a[1], s2 in a[2], and s3 in a[3].
The number of fields found, here three, is returned as the value of split. The third argument of split is a
extended regular expression to be used as the field separator. If the third argument is missing, FS is used as
the field separator.
An array element may be deleted with the delete statement:
delete arrayname[subscript]
Userdefined functions
awk provides userdefined functions. A function is defined as
function name(argumentlist) {
statements
}
The definition can occur anywhere a patternaction statement can. The argument list is a list of variable
names separated by commas; within the body of the function these variables refer to the actual parameters
when the function is called. No space must be left between the function name and the left parenthesis of the
argument list when the function is called; otherwise it looks like a concatenation. For example, the following
program defines and tests the usual recursive factorial function (of course, using some input other than the file
countries):
function fact(n)
if (n <= 1)
return
else
return
}
{ print $1 "! is
{
1
n * fact(n1)
" fact($1) }
Userdefined functions
79
# this is a comment
Statements in an awk program normally occupy a single line. Several statements may occur on a single line if
they are separated by semicolons. A long statement may be continued over several lines by terminating each
continued line by a backslash. (It is not possible to continue a ``...'' string.) This explicit continuation is rarely
necessary, however, since statements continue automatically if the line ends with a comma. (For example, this
might occur in a print or printf statement) or after the operators && and ||.
Several patternaction statements may appear on a single line if separated by semicolons.
Output
The print and printf statements are the two primary constructs that generate output. The print statement is
used to generate simple output; printf is used for more carefully formatted output. Like the shell, awk lets
you redirect output, so that output from print and printf can be directed to files and pipes. This section
describes the use of these two statements.
prints the string value of each expression separated by the output field separator followed by the output record
separator. The statement
print
is an abbreviation for
print $0
Output separators
The output field separator and record separator are held in the builtin variables OFS and ORS. Initially,
Some lexical conventions
80
Notice that
{ print $1 $2 }
prints the first and second fields with no intervening output field separator because $1 $2 is a string consisting
of the concatenation of the first two fields.
where format is a string that contains both information to be printed and specifications on what conversions
are to be performed on the expressions in the argument list, as in ``awk printf conversion characters''. Each
specification begins with a %, ends with a letter that determines the conversion, and may include:
Prints expression as
single character
decimal number
[]d.ddddddE[+]dd
[]ddd.dddddd
e or f conversion, whichever is shorter, with nonsignificant zeros suppressed
unsigned octal number
string
unsigned hexadecimal number
print a %; no argument is converted
81
"%d", 99/2
49
"%e", 99/2
4.950000e+01
"%f", 99/2
49.500000
"%6.2f", 99/2 49.50
"%g", 99/2
49.5
"%o", 99
143
"%06o", 99
000143
"%x", 99
63
"|%s|", "January"
|January|
"|%10s|", "January"
|
January|
"|%10s|", "January" |January
|
"|%.3s|", "January"
|Jan|
"|%10.3s|", "January" |
Jan|
"|%10.3s|", "January"
|Jan
"%%" %
The default output format of numbers is %.6g; this can be changed by assigning a new value to OFMT.
OFMT also controls the conversion of numeric values to strings for concatenation and creation of array
subscripts.
Output to files
You can print output to files instead of to the standard output by using the > and >> redirection operators. For
example, the following program invoked on the file countries prints all lines where the population (third field)
is bigger than 100 into a file called bigpop, and all other lines into a file called smallpop:
$3 > 100
$3 <= 100
Notice that the filenames have to be quoted; without quotes, bigpop and smallpop are merely uninitialized
variables. If the output filenames were created by an expression, they would also have to be enclosed in
parentheses:
$4 ~ /North America/ { print $1 > ("tmp" FILENAME) }
because the > operator has higher precedence than concatenation; without parentheses, the concatenation of
tmp and FILENAME would not work.
NOTE: Files are opened once in an awk program. If > is used to open a file, its original contents are
overwritten. But if >> is used to open a file, its contents are preserved and the output is appended to the file.
Once the file has been opened, the two operators have the same effect.
Output to pipes
You can also direct printing to a pipe with a command on the other end, instead of to a file. The statement
print | "commandline"
Output to files
82
{ FS = "\t" }
{ pop[$4] += $3 }
{ for (c in pop)
print c ":" pop[c] | "sort" }
In all these print statements involving redirection of output, the files or pipes are identified by their names
(that is, the pipe above is literally named sort ), but they are created and opened only once in the entire run.
So, in the last example, for all c in pop, only one sort pipe is open.
There is a limit to the number of files that can be open simultaneously. The statement close(file) closes a file
or pipe; file is the string used to create it in the first place, as in
close("sort")
Input
The most common way to give input to an awk program is to name on the command line the file(s) that
contains the input. This is the method used in this topic; however, several other methods can be used. Each of
these is described in this section.
If no filenames are given, awk reads its standard input; thus, a second common arrangement is to have another
program pipe its output into awk. For example, grep(C) selects input lines containing a specified regular
expression, but it can do so faster than awk, since this is the only thing it does. We could, therefore, invoke
the pipe
$ grep 'Asia' countries | awk '. . .'<<Return>>
Input
83
Input separators
With the default setting of the field separator FS, input fields are separated by blanks or tabs, and leading
blanks are discarded, so each of these lines has the same first field:
field1
field1
field1
field2
When the field separator is a tab, however, leading blanks are not discarded.
The field separator can be set to any extended regular expression by assigning a value to the builtin variable
FS. For example,
BEGIN { FS = ",[\t]*|([\t]+)" }
makes into field separators every string consisting of a comma followed by blanks or tabs and every string of
blanks or tabs with no comma. FS can also be set on the command line with the F argument:
$ awk F'(,[ \t] )|([ \t]+)' '. . .'<<Return>>
behaves the same as the previous example. Regular expressions used as field separators match the leftmost
longest occurrences (as in sub), but do not match null strings.
Multiline records
Records are normally separated by newlines, so that each line is a record; but this too can be changed, though
only in a limited way. If the builtin record separator variable RS is set to the empty string, as in
BEGIN
{ RS = "" }
then input records can be several lines long; a sequence of empty lines separates records. A common way to
process multipleline records is to use
BEGIN
{ RS = ""; FS = "\n" }
to set the record separator to an empty line and the field separator to a newline. Each line is then one field.
However, the length of a record is limited; it is usually about 2500 characters. ``The getline function'' and
``Cooperation with the shell'' show other examples of processing multiline records.
Input separators
84
Once the line containing STOP is encountered, the record can be processed from the data in the f array:
/^START/ {
f[nf=1] = $0
while (getline && $0 !~ /^STOP/)
f[++nf] = $0
# now process the data in f[1]...f[nf]
...
}
Notice that this code uses the fact that && evaluates its operands left to right and stops as soon as one is true.
The same job can also be done by the following program:
/^START/ && nf==0
{ f[nf=1] = $0 }
nf > 1
{ f[++nf] = $0 }
/^STOP/
{ # now process the data in f[1]...f[nf]
...
nf = 0
}
The statement
getline x
reads the next record into the variable x. No splitting is done; NF is not set. The statement
getline <"file"
reads from file instead of the current input. It has no effect on NR or FNR, but field splitting is performed and
NF is set. The statement
getline x <"file"
gets the next record from file into x; no splitting is done, and NF, NR and FNR are untouched.
If a filename is an expression, it should be in parentheses for evaluation:
while ( getline x < (ARGV[1] ARGV[2]) ) {
... }
because the < has precedence over concatenation. Without parentheses, a statement such as
getline x < "tmp" FILENAME
Input separators
85
loops forever if the file cannot be read because getline returns 1, not zero if an error occurs. A better way to
write this test is
while ( getline x < file > 0) { ... }
You can also pipe the output of another command directly into getline. For example, the statement
while ("who" | getline)
n++
executes who and pipes its output into getline. Each iteration of the while loop reads one more line and
increments the variable n, so after the while loop terminates, n contains a count of the number of users.
Similarly, the statement
"date" | getline d
pipes the output of date into the variable d, thus setting d to the current date. Note that, in this case, awk
leaves the pipeline (and thus the resources associated with date) open, since only one line was read from the
pipeline. An explicit close("date") will clear up these unneeded resources. Similarly, if a new invocation of
date is desired later, an explicit close("date") is also needed. Otherwise getline would try to read a second
line from the first invocation. ``getline function'' summarizes the getline function.
getline function
Form
getline
getline var
getline <file
getline var <file
cmd | getline
cmd | getline var
Sets
$0, NF, NR, FNR
var, NR, FNR
$0, NF
var
$0, NF
var
Commandline arguments
The commandline arguments are available to an awk program: the array ARGV contains the elements
ARGV[0], . . ., ARGV[ARGC1]; as in C, ARGC is the count. ARGV[0] is the name of the program
(generally awk); the remaining arguments are whatever was provided (excluding the program and any
optional arguments) when awk is invoked. The following command line contains an awk program that echoes
the arguments that appear after the program name:
awk '
BEGIN {
for (i = 1; i < ARGC; i++)
printf "%s ", ARGV[i]
printf "\n"
}' $
Commandline arguments
86
calls the command cat to print the file named in the second field of every input record whose first field is
#include, after stripping any <, >, or " that might be present.
Since awk uses many of the same characters as the shell does, such as $ and ", surrounding the awk program
with single quotes ensures that the shell will pass the entire program unchanged to the awk interpreter.
Now, consider writing a command addr that will search a file addresslist for name, address and telephone
information. Suppose that addresslist contains names and addresses in which a typical entry is a multiline
record such as
G. R. Emlin
600 Mountain Avenue
Murray Hill, NJ 07974
2015551234
87
The problem is how to get a different search pattern into the program each time it is run.
There are several ways to do this. One way is to create a file called addr that contains
awk '
BEGIN
{ RS = "" }
/'"$1"'/
' addresslist
The quotes are critical here. The awk program is only one argument, even though there are two sets of quotes
because quotes do not nest. The $1 is outside the single quotes but inside the double quotes, and thus is visible
to the shell, which therefore replaces it by the pattern Emlin when the command addr Emlin is invoked. On a
SCO OpenServer system, addr can be made executable by changing its mode with the following command:
chmod +x addr
A second way to implement addr relies on the fact that the shell substitutes for $ parameters within double
quotes:
awk "
BEGIN
{ RS = \"\" }
/$1/
" addresslist
Therefore, you must protect the quotes defining RS with backslashes, so that the shell passes them on to awk
without interpretation. $1 is recognized as a parameter, however, so the shell replaces it by the pattern when
the command addr pattern is invoked.
A third way to implement addr is to use ARGV to pass the extended regular expression to an awk program
that explicitly reads through the address list with getline:
awk '
BEGIN
{ RS = ""
while (getline < "addresslist")
if ($0 ~ ARGV[1])
print $0
} ' $*
88
Example applications
awk has been used in surprising ways: to implement database systems and a variety of compilers and
assemblers, in addition to the more traditional tasks of information retrieval, data manipulation, and report
generation. Invariably, the awk programs are significantly shorter than equivalent programs written in more
conventional programming languages such as Pascal or C. This section presents a few more examples to
illustrate some additional awk programs.
Generating reports
awk is especially useful for producing reports that summarize and format information. Suppose you want to
produce a report from the file countries in which the continents are listed alphabetically, and the countries on
each continent are listed after in decreasing order of population:
Africa:
Sudan
Algeria
19
18
The arguments for sort deserve special mention. The t: argument tells sort to use : as its field separator. The
+0 1 arguments make the first field the primary sort key. In general, +i j makes fields i+1, i+2, . . ., j the
sort key. If j is omitted, the fields from i+1 to the end of the record are used. The +2nr argument makes the
third field, numerically decreasing, the secondary sort key (n is for numeric, r for reverse order). Invoked on
the file countries, this program produces as output
Africa:Sudan:19
Africa:Algeria:18
Asia:China:866
Asia:India:637
Asia:USSR:262
Australia:Australia:14
North America:USA:219
North America:Canada:24
Example applications
89
This output is in the right order but the wrong format. To transform the output into the desired form, run it
through a second awk program format:
BEGIN
{
{ FS = ":" }
if ($1 != prev) {
print "\n" $1 ":"
prev = $1
}
printf "\t%10s %6d\n", $2, $3
This is a controlbreak program that prints only the first occurrence of a continent name and formats the
countrypopulation lines associated with that continent in the desired manner. The command line
$ awk f triples countries | awk f format<<Return>>
gives the desired report. As this example suggests, complex data transformation and formatting tasks can
often be reduced to a few simple awk commands and sorts.
Word frequencies
Our first example illustrates associative arrays for counting. Suppose you want to count the number of times
each word appears in the input, where a word is any contiguous sequence of nonblank, nontab characters.
The following program prints the word frequencies, sorted in decreasing order.
{ for (w = 1; w <= NF; w++) count[$w]++ }
END { for (w in count) print count[w], w | "sort nr" }
The first statement uses the array count to accumulate the number of times each word is used. Once the input
has been read, the second for loop pipes the final count along with each word into the sort command.
Accumulation
Suppose we have two files, deposits and withdrawals, of records containing a name field and an amount
field. For each name we want to print the net balance determined by subtracting the total withdrawals from the
total deposits for each name. The net balance can be computed by the following program:
awk '
FILENAME == "deposits"
FILENAME == "withdrawals"
END
{ balance[$1] += $2 }
{ balance[$1] = $2 }
{ for (name in balance)
print name, balance[name]
The first statement uses the array balance to accumulate the total amount for each name in the file deposits.
The second statement subtracts associated withdrawals from each total. If only withdrawals are associated
with a name, an entry for that name is created by the second statement. The END action prints each name with
its net balance.
Word frequencies
90
Random choice
The following function prints (in order) k random elements from the first n elements of the array A. In the
program, k is the number of entries that still need to be printed, and n is the number of elements yet to be
examined. The decision of whether to print the ith element is determined by the test rand() < k/n.
function choose(A, k, n, i) {
for (i = 1; n > 0; i++)
if (rand() < k/n) {
print A[i]
k
}
}
}
History facility
The following awk program roughly simulates the history facility of certain shells. A line containing only =
reexecutes the last command executed. A line beginning with = cmd reexecutes the last command whose
invocation included the string cmd. Otherwise, the current line is executed.
$1 == "=" { if (NF == 1)
system(x[NR] = x[NR1])
else
for (i = NR1; i > 0; i)
if (x[i] ~ $2) {
system(x[NR] = x[i])
break
}
next }
Formletter generation
The following program generates form letters, using a template stored in a file called form.letter:
This is a form letter.
The first field is $1, the second $2, the third $3.
The third is $3, second is $2, and first is $1.
The BEGIN action stores the template in the array line; the remaining action cycles through the input data,
using gsub to replace template fields of the form $n with the corresponding data fields.
BEGIN {
FS = "|"
while (getline <"form.letter")
line[++n] = $0
Random choice
91
In all such examples, a prudent strategy is to start with a small version and expand it, trying out each aspect
before moving on to the next.
awk summary
The following sections summarize the features of awk.
Command line
awk program filenames
awk f programfile filenames
awk Fs sets field separator to string s; Ft sets separator to tab
Patterns
BEGIN
END
/extended regular expression/
relational expression
pattern && pattern
pattern || pattern
(pattern)
!pattern
pattern, pattern
awk summary
92
Inputoutput
close(filename)
getline
close file
set $0 from next input record; set NF, NR,
FNR
set $0 from next record of file; set NF
set var from next input record; set NR, FNR
set var from next record of file
print current record
print expressions
print expressions on file
format and print
format and print on file
execute command cmdline, return status
getline <file
getline var
getline var <file
print
print exprlist
print exprlist >file
printf fmt, exprlist
printf fmt, exprlist >file
system(cmdline)
In print and printf above, >>file appends to the file, and | command writes on a pipe. Similarly, command |
getline pipes into getline. getline returns 0 on end of file, and 1 on error.
Functions
func name(parameter list) { statement }
function name(parameter list) { statement }
functionname(expr, expr, . . .)
String functions
gsub(r,s,t)
index(s,t)
length(s)
match(s,r)
split(s,a,r)
sprintf(fmt, exprlist)
sub(r,s,t)
substr(s,i,n)
Arithmetic functions
atan2(y,x)
Inputoutput
93
assignment
conditional expression
logical OR
logical AND
extended regular expression match, negated match
relationals
string concatenation
add, subtract
multiply, divide, mod
unary plus, unary minus, logical negation
exponentiation ( is a synonym)
increment, decrement (prefix and postfix)
field
matches nonmetacharacter c
matches literal character c
matches any character but newline
matches beginning of line or string
matches end of line or string
character class matches any of abc...
negated class matches any but abc...
matches either r1 or r2
concatenation: matches r1, then r2
matches one or more r's
matches zero or more r's
94
Builtin variables
ARGC
ARGV
FILENAME
FNR
FS
NF
NR
OFMT
OFS
ORS
RS
RSTART
RLENGTH
SUBSEP
Limits
Any particular implementation of awk enforces some limits. Here are typical values:
100 fields
2500 characters per input record
2500 characters per output record
1024 characters per individual field
1024 characters per printf string
400 characters maximum quoted string
400 characters in character class
15 open files
1 pipe
numbers are limited to what can be represented on the local machine,
for example, 1e38..1e+38
95
its type is set to that of the expression. (Assignment includes +=, =, and so on.) An arithmetic expression is
of type number, a concatenation is of type string, and so on. If the assignment is a simple copy, as in
v1 = v2
and to string by
expr ""
is false, and
if (!x) ...
if (x == 0) ...
if (x == "") ...
the type of each field is determined on input. All fields are strings; also, each field that contains only a number
is also considered numeric.
Builtin variables
96
causes it to exist with the value "" so the if is satisfied. The special construction
if (i in arr) ...
determines if arr[i] exists without the side effect of creating it if it does not.
Builtin variables
97
where lex.l is the file containing your lex specification. (The name lex.l is the favored convention, but you
may use whatever name you want. Keep in mind, though, that the .l suffix is a convention recognized by other
SCO OpenServer system tools, in particular, make.) The source code is written to an output file called lex.yy.c
by default. That file contains the definition of a function called yylex() that returns 1 whenever an expression
you have specified is found in the input text, 0 when end of file is encountered. Each call to yylex() parses one
token. When yylex() is called again, it picks up where it left off.
Note that running lex on a specification that is spread across several files
$ lex lex1.l lex2.l lex3.l
produces one lex.yy.c. Invoking lex with the t option causes it to write its output to stdout rather than
lex.yy.c, so that it can be redirected:
$ lex t lex.l > lex.c
98
Alternatively, you may want to write your own driver. The following is similar to the library version:
extern int yylex();
its main() will call yylex() at run time exactly as if the lex library had been loaded. The resulting executable
reads stdin and writes its output to stdout. Figure 41 shows how lex works.
99
These three regular expressions match any occurrences of those character strings in an input text. If you want
to have the scanner remove every occurrence of orange from the input text, you could specify the rule
orange
Because you specified a null action on the right with the semicolon, the scanner does nothing but print out the
original input text with every occurrence of this regular expression removed, that is, without any occurrence
of the string orange at all.
Operators
Unlike orange above, most of the expressions that we want to search for cannot be specified so easily. The
expression itself might simply be too long. More commonly, the class of desired expressions is too large; it
may, in fact, be infinite. Thanks to the use of operators summarized in O below we can form regular
expressions to signify any expression of a certain class. The + operator, for instance, means one or more
occurrences of the preceding expression, the ? means 0 or 1 occurrence(s) of the preceding expression (which
is equivalent, of course, to saying that the preceding expression is optional), and means 0 or more
occurrences of the preceding expression. (It may at first seem odd to speak of 0 occurrences of an expression
and to need an operator to capture the idea, but it is often quite helpful. We will see an example in a moment.)
So m+ is a regular expression that matches any string of ms:
mmm
m
mmmmm
and 7 is a regular expression that matches any string of zero or more 7s:
77
100
is a regular expression that matches any letter (whether upper or lowercase), any digit, an asterisk, an
ampersand, or a sharp character. Given the input text
$$$$?? ????!!! $$ $$$$$$&+====r~~# ((
the lexical analyzer with the previous specification in one of its rules will recognize , &, r, and #, perform on
each recognition whatever action the rule specifies (we have not indicated an action here), and print out the
rest of the text as it stands. If you want to include the hyphen character in the class, it should appear as the
first or last character in the brackets: [AZ] or [AZ].
The operators become especially powerful in combination. For example, the regular expression to recognize
an identifier in many programming languages is
[azAZ][09azAZ]
An identifier in these languages is defined to be a letter followed by zero or more letters or digits, and that is
just what the regular expression says. The first pair of brackets matches any letter. The second, if it were not
followed by a , would match any digit or letter. The two pairs of brackets with their enclosed characters
would then match any letter followed by a digit or a letter. But with the , the example matches any letter
followed by any number of letters or digits. In particular, it would recognize the following as identifiers:
e
not
idenTIFIER
pH
EngineNo99
R2D2
because not_idenTIFIER has an embedded underscore; 5times starts with a digit, not a letter; and $hello
starts with a special character.
101
To recognize a \ itself, we need two backslashes: \\. Similarly, ``"x\ x"'' matches x x, and ``"y\"z"'' matches
y"z. Other lex operators are noted as they arise in the discussion below. lex recognizes all the C language
escape sequences.
lex operators
Expression
\x
"xy"
[xy]
[xz]
[^x]
.
^x
<y>x
x$
x?
x
x+
x{m,n}
xx|yy
x |
(x)
x/y
{xx}
Description
x, if x is a lex operator
xy, even if x or y are lex operators (except \)
x or y
x, y, or z
any character but x
any character but newline
x at the beginning of a line
x when lex is in start condition y
x at the end of a line
optional x
0, 1, 2, . . . instances of x
1, 2, 3, . . . instances of x
m through n occurrences of x
either xx or yy
the action on x is the action for the next rule
x
x but only if followed by y
the translation of xx from the definitions section
Actions
Once the scanner recognizes a string matching the regular expression at the start of a rule, it looks to the right
of the rule for the action to be performed. You supply the actions. Kinds of actions include recording the
token type found and its value, if any; replacing one token with another; and counting the number of instances
of a token or token type. You write these actions as program fragments in C. An action may consist of as
many statements as are needed for the job at hand. You may want to change the text in some way or simply
print a message noting that the text has been found. So, to recognize the expression Amelia Earhart and to
note such recognition, the rule
"Amelia Earhart"
printf("found Amelia");
102
printf("EEG");
would be called for. To count the lines in a text, we need to recognize the ends of lines and increment a
linecounter. As we have noted, lex uses the standard C escape sequences, including \n for newline. So, to
count lines we might have
\n
lineno++;
where lineno, like other C variables, is declared in the definitions section that we discuss later.
Input is ignored when the C language null statement ; is specified. So the rule
[ \t\n]
causes blanks, tabs, and newlines to be ignored. Note that the alternation operator | can also be used to
indicate that the action for a rule is the action for the next rule. The previous example could have been written:
" "
\t
\n
|
|
;
{ digstrngcount++;
printf("%d",digstrngcount);
printf("%s", yytext);
}
This specification matches digit strings whether they are preceded by a plus sign or not, because the ?
indicates that the preceding plus sign is optional. In addition, it will catch negative digit strings because that
portion following the minus sign will match the specification. The next section explains how to distinguish
negative from positive integers.
103
The first three rules recognize negative integers, positive integers, and negative fractions between 0 and 1.
The use of the terminating + in each specification ensures that one or more digits compose the number in
question. Each of the next three rules recognizes a specific pattern. The specification for railroad matches
cases where one or more blanks intervene between the two syllables of the word. In the cases of railroad and
crook, we could have simply printed a synonym rather than the messages stated. The rule recognizing a
function simply increments a counter. The last rule illustrates several points:
The braces specify an action sequence that extends over several lines.
Its action uses the lex array yytext[], which stores the recognized character string.
Its specification uses the to indicate that zero or more letters may follow the G.
Some special features
Besides storing the matched input text in yytext[], the scanner automatically counts the number of characters
in a match and stores it in the variable yyleng. You may use this variable to refer to any specific character just
placed in the array yytext[]. Remember that C language array indexes start with 0, so to print out the third
digit (if there is one) in a just recognized integer, you might enter
[19]+
lex follows a number of highlevel rules to resolve ambiguities that may arise from the set of rules that you
write. In the following lexical analyzer example, the ``reserved word'' end could match the second rule as well
as the eighth, the one for identifiers:
begin
end
while
if
package
reverse
loop
[azAZ][azAZ09]*
[09]+
\+
\
>
>=
return(BEGIN);
return(END);
return(WHILE);
return(IF);
return(PACKAGE);
return(REVERSE);
return(LOOP);
{ tokval = put_in_tabl();
return(IDENTIFIER); }
{ tokval = put_in_tabl();
return(INTEGER); }
{ tokval = PLUS;
return(ARITHOP); }
{ tokval = MINUS;
return(ARITHOP); }
{ tokval = GREATER;
return(RELOP); }
{ tokval = GREATEREQL;
return(RELOP); }
104
we cannot be sure that the first 1 is the initial value of the index k until we read the first comma. Until then,
we might have the assignment statement
DO50k = 1
(Remember that FORTRAN ignores all blanks.) The way to handle this is to use the slash, /, which signifies
that what follows is trailing context, something not to be stored in yytext[], because it is not part of the pattern
itself. So the rule to recognize the FORTRAN DO statement could be
DO/([ ] [09]+[ ] [azAZ09]+=[azAZ09]+,)
printf("found DO");
}
Different versions of FORTRAN have limits on the size of identifiers, here the index name. To simplify the
example, the rule accepts an index name of any length. See ``Start conditions'' for a discussion of lex`s similar
handling of prior context.
lex uses the ``$'' symbol as an operator to mark a special trailing context the end of a line. An example
would be a rule to ignore all blanks and tabs at
the end of a line:
[
\t]+$
\t]+/\n
On the other hand, if you want to match a pattern only when it starts a line or a file, you can use the ^
operator. Suppose a textformatting program requires that you not start a line with a blank. You might want to
check input to the program with some such rule as
^[ ]
105
Upon finding the first double quotation mark, the scanner will simply continue reading all subsequent
characters so long as none is a double quotation mark, and not look for a match again until it finds a second
double quotation mark. (See the further examples of input() and unput(c) usage in ``User routines''.)
By default, these routines are provided as macro definitions. To handle special I/O needs, such as writing to
several files, you may use standard I/O routines in C to rewrite the functions. Note, however, that they must
be modified consistently. In particular, the character set used must be consistent in all routines, and a value of
0 returned by input() must mean end of file. The relationship between input() and unput(c) must be
maintained or the lex lookahead will not work.
If you do provide your own input(), output(c), or unput(c), you will have to write a #undef input and so on
in your definitions section first:
#undef input
#undef output
.
.
.
#define input() . . . etc.
more declarations
.
.
.
Your new routines will replace the standard ones. See ``Definitions'' for further details.
A lex library routine that you may sometimes want to redefine is yywrap(), which is called whenever the
scanner reaches end of file. If yywrap() returns 1, the scanner continues with normal wrapup on end of input.
Occasionally, however, you may want to arrange for more input to arrive from a new source. In that case,
redefine yywrap() to return 0 whenever further processing is required. The default yywrap() always returns 1.
Note that it is not possible to write a normal rule that recognizes end of file; the only access to that condition
is through yywrap(). Unless a private version of input() is supplied, a file containing nulls cannot be handled
because a value of 0 returned by input() is taken to be end of file.
There are a number of lex routines that let you handle sequences of characters to be processed in more than
one way. These include yymore(), yyless(n), and REJECT. Recall that the text that matches a given
specification is stored in the array yytext[]. In general, once the action is performed for the specification, the
characters in yytext[] are overwritten with succeeding characters in the input stream to form the next match.
The function yymore(), by contrast, ensures that the succeeding characters recognized are appended to those
already in yytext[]. This lets you do one thing and then another, when one string of characters is significant
and a longer one including the first is significant as well. Consider a language that defines a string as a set of
Advanced lex usage
106
{
if (yytext[yyleng1] == '\\')
yymore();
else
. . . normal processing
}
When faced with the string ``"abc\"def"'', the scanner will first match the characters "abc\, whereupon the call
to yymore() will cause the next part of the string "def to be tacked on the end. The double quotation mark
terminating the string should be picked up in the code labeled ``normal processing.''
The function yyless(n) lets you specify the number of matched characters on which an action is to be
performed: only the first n characters of the expression are retained in yytext[]. Subsequent processing
resumes at the nth + 1 character. Suppose you are again in the code deciphering business and the idea is
to work with only half the characters in a sequence that ends with a certain one, say upper or lowercase Z. The
code you want might be
[ayAY]+[Zz]
yyless(yyleng/2);
. . . process first half of string . . . }
Finally, the function REJECT lets you more easily process strings of characters even when they overlap or
contain one another as parts. REJECT does this by immediately jumping to the next rule and its specification
without changing the contents of yytext[]. If you want to count the number of occurrences both of the regular
expression snapdragon and of its subexpression dragon in an input text, the following will do:
snapdragon
dragon
{countflowers++; REJECT;}
countmonsters++;
As an example of one pattern overlapping another, the following counts the number of occurrences of the
expressions comedian and diana, even where the input text has sequences such as comediana..:
comedian
diana
{comiccount++; REJECT;}
princesscount++;
Note that the actions here may be considerably more complicated than simply incrementing a counter. In all
cases, you declare the counters and other necessary variables in the definitions section commencing the lex
specification.
Definitions
The lex definitions section may contain any of several classes of items. The most critical are external
definitions, preprocessor statements like #include, and abbreviations. Recall that for valid lex source this
section is optional, but in most cases some of these items are necessary. Preprocessor statements and C source
code should appear between a line of the form %{ and one of the form %}. All lines between these delimiters
including those that begin with white space are copied to lex.yy.c immediately before the definition of
yylex(). (Lines in the definition section that are not enclosed by the delimiters are copied to the same place
provided they begin with white space.) The definitions section is where you would normally place C
definitions of objects accessed by actions in the rules section or by routines with external linkage.
107
After the %} that ends your #include's and declarations, you place your abbreviations for regular expressions
to be used in the rules section. The abbreviation appears on the left of the line and, separated by one or more
spaces, its definition or translation appears on the right. When you later use abbreviations in your rules, be
sure to enclose them within braces. Abbreviations avoid needless repetition in writing your specifications and
make them easier to read.
As an example, reconsider the lex source reviewed at the beginning of this section on advanced lex usage. The
use of definitions simplifies our later reference to digits, letters, and blanks. This is especially true if the
specifications appear several times:
D
L
B
%%
{D}+
\+?{D}+
0.{D}+
G{L}*
rail{B}road
crook
.
.
[09]
[azAZ]
[ \t]+
printf("negative integer");
printf("positive integer");
printf("negative fraction");
printf("may have a G word here");
printf("railroad is one word");
printf("criminal");
.
.
Start conditions
Some problems require for their solution a greater sensitivity to prior context than is afforded by the ^
operator alone. You may want different rules to be applied to an expression depending on a prior context that
is more complex than the end of a line or the start of a file. In this situation you could set a flag to mark the
change in context that is the condition for the application of a rule, then write code to test the flag.
Alternatively, you could define for lex the different ``start conditions'' under which it is to apply each rule.
Consider this problem: copy the input to the output, except change the word magic to the word first on every
line that begins with the letter a; change magic to second on every line that begins with b; change magic to
third on every line that begins with c. Here is how the problem might be handled with a flag. Recall that
ECHO is a lex macro equivalent to printf("%s", yytext):
int flag
%%
^a {flag =
^b {flag =
^c {flag =
\n {flag =
magic {
switch
{
'a'; ECHO;}
'b'; ECHO;}
'c'; ECHO;}
0; ECHO;}
(flag)
case 'a': printf("first"); break;
case 'b': printf("second"); break;
108
To handle the same problem with start conditions, each start condition must be introduced to lex in the
definitions section with a line reading
%Start name1 name2 . . .
where the conditions may be named in any order. The word Start may be abbreviated to ``S'' or ``s''. The
conditions are referenced at the head of a rule with ``<>'' brackets. So
<name1>expression
is a rule that is only recognized when the scanner is in start condition name1. To enter a start condition,
execute the action statement
BEGIN name1;
which changes the start condition to name1. To resume the normal state
BEGIN 0;
resets the initial condition of the scanner. A rule may be active in several start conditions. That is,
<name1,name2,name3>
is a valid prefix. Any rule not beginning with the <> prefix operators is always active.
The example can be written with start conditions as follows:
%Start AA
%%
^a
^b
^c
\n
<AA>magic
<BB>magic
<CC>magic
BB CC
{ECHO; BEGIN AA;}
{ECHO; BEGIN BB;}
{ECHO; BEGIN CC;}
{ECHO; BEGIN 0;}
printf("first");
printf("second");
printf("third");
User routines
You may want to use your own routines in lex for much the same reason that you do so in other programming
languages. Action code that is to be used for several rules can be written once and called when needed. As
with definitions, this can simplify the writing and reading of programs. The function put_in_tabl(), to be
discussed in the next section on lex and yacc, is a good candidate for the user routines section of a lex
specification.
Another reason to place a routine in this section is to highlight some code of interest or to simplify the rules
section, even if the code is to be used for one rule only. As an example, consider the following routine to
ignore comments in a language like C where comments occur between / and /:
Advanced lex usage
109
There are three points of interest in this example. First, the unput(c) macro (putting back the last character
read) is necessary to avoid missing the final / if the comment ends with a /. In this case, eventually having
read a , the scanner finds that the next character is not the terminal / and must read some more. Second, the
expression yytext[yyleng1] picks out that last character read. Third, this routine assumes that the comments
are not nested, which is indeed the case with the C language.
return(BEGIN);
return(END);
return(WHILE);
return(IF);
return(PACKAGE);
return(REVERSE);
110
return(LOOP);
{ tokval = put_in_tabl();
return(IDENTIFIER); }
{ tokval = put_in_tabl();
return(INTEGER); }
{ tokval = PLUS;
return(ARITHOP); }
{ tokval = MINUS;
return(ARITHOP); }
{ tokval = GREATER;
return(RELOP); }
{ tokval = GREATEREQL;
return(RELOP); }
Despite appearances, the tokens returned, and the values assigned to tokval, are indeed integers. Good
programming style dictates that we use informative terms such as BEGIN, END, WHILE, and so forth to
signify the integers the parser understands, rather than use the integers themselves. You establish the
association by using #define statements in your parser calling routine in C. For example,
#define BEGIN 1
#define END 2
.
#define PLUS 7
.
If the need arises to change the integer for some token type, you then change the #define statement in the
parser rather than hunt through the entire program changing every occurrence of the particular integer. In
using yacc to generate your parser, insert the statement
#include "y.tab.h"
in the definitions section of your lex source. The file y.tab.h, which is created when yacc is invoked with the
d option, provides #define statements that associate token names such as BEGIN, END, and so on with the
integers of significance to the generated parser.
To indicate the reserved words in the example, the returned integer values suffice. For the other token types,
the integer value of the token type is stored in the programmerdefined variable tokval. This variable, whose
definition was an example in the definitions section, is globally defined so that the parser as well as the lexical
analyzer can access it. yacc provides the variable
yylval for the same purpose.
Note that the example shows two ways to assign a value to tokval. First, a function put_in_tabl() places the
name and type of the identifier or constant in a symbol table so that the compiler can refer to it in this or a
later stage of the compilation process. More to the present point, put_in_tabl() assigns a type value to tokval
so that the parser can use the information immediately to determine the syntactic correctness of the input text.
The function put_in_tabl() would be a routine that the compiler writer might place in the user routines
section of the parser. Second, in the last few actions of the example, tokval is assigned a specific integer
indicating which arithmetic or relational operator the scanner recognized. If the variable PLUS, for instance,
is associated with the integer 7 by means of the #define statement above, then when a + is recognized, the
action assigns to tokval the value 7, which indicates the +. The scanner indicates the general class of operator
by the value it returns to the parser (that is, the integer signified by ARITHOP or RELOP).
In using lex with yacc, either may be run first. The command
111
generates a parser in the file y.tab.c. As noted, the d option creates the file y.tab.h, which contains the
#define statements that associate the yaccassigned integer token values with the userdefined token names.
Now you can invoke lex with the command
$ lex lex.l
then compile and link the output files with the command
$ cc lex.yy.c y.tab.c ly ll
Note that the yacc library is loaded (via ly) before the lex library (via ll) to ensure that the supplied main()
will call the yacc parser.
Miscellaneous
Recognition of expressions in an input text is performed by a deterministic finite automaton generated by lex.
The v option prints out for you a small set of statistics describing the finite automaton. (For a detailed
account of finite automata and their importance for lex, see the Aho, Sethi, and Ullman text, Compilers:
Principles, Techniques, and Tools, AddisonWesley, 1986.)
lex uses a table to represent its finite automaton. The maximum number of states that the finite automaton
allows is set by default to 500. If your lex source has a large number of rules or the rules are very complex,
this default value may be too small. You can enlarge the value by placing another entry in the definitions
section of your lex source as follows:
%n 700
This entry tells lex to make the table large enough to handle as many as 700 states. (The v option will
indicate how large a number you should choose.) If you have need to increase the maximum number of state
transitions beyond 2000, the designated parameter is a, thus:
%a 2800
112
where nnn is a decimal integer representing an array size and x selects the parameter as
follows:
p positions
n states
e tree nodes
a transitions
k packed character classes
o output array size
Lines in the rules section have the form
expression action
where the action may be continued on succeeding lines by using braces to delimit it.
The lex operator characters are
" \ [] ^ ? .
| () $ / {} <> +
Miscellaneous
array of
char
int
function
function
function
function
macro
macro
macro
macro
macro
113
month_name
day
year
where date, month_name, day, and year represent constructs of interest; presumably, month_name, day,
and year are defined in greater detail elsewhere. In the example, the comma is enclosed in single quotes. This
means that the comma is to appear literally in the input. The colon and semicolon merely serve as punctuation
in the rule and have no significance in evaluating the input. With proper definitions, the input
July
4, 1776
;
;
;
might be used in the above example. While the lexical analyzer only needs to recognize individual letters,
such lowlevel rules tend to waste time and space, and may complicate the specification beyond the ability of
yacc to deal with it. Usually, the lexical analyzer recognizes the month names and returns an indication that a
month_name is seen. In this case, month_name is a token and the detailed rules are not needed.
Literal characters such as a comma must also be passed through the lexical analyzer and are also considered
tokens.
114
allowing
7/4/1776
as a synonym for
July 4, 1776
on input. In most cases, this new rule could be slipped into a working system with minimal effort and little
danger of disrupting existing input.
The input being read may not conform to the specifications. With a lefttoright scan, input errors are
detected as early as is theoretically possible. Thus, not only is the chance of reading and computing with bad
input data substantially reduced, but the bad data usually can be found quickly. Error handling, provided as
part of the input specifications, permits the reentry of bad data or the continuation of the input process after
skipping over the bad data.
In some cases, yacc fails to produce a parser when given a set of specifications. For example, the
specifications may be selfcontradictory, or they may require a more powerful recognition mechanism than
that available to yacc. The former cases represent design errors; the latter cases often can be corrected by
making the lexical analyzer more powerful or by rewriting some of the grammar rules. While yacc cannot
handle all possible specifications, its power compares favorably with similar systems. Moreover, the
constructs that are difficult for yacc to handle are also frequently difficult for human beings to handle. Some
users have reported
that the discipline of formulating valid yacc specifications for their input revealed errors of conception or
design early in program development.
The remainder of this topic describes the following subjects:
basic process of preparing a yacc specification
parser operation
handling ambiguities
handling operator precedences in arithmetic expressions
error detection and recovery
the operating environment and special features of the parsers yacc produces
suggestions to improve the style and efficiency of the specifications
advanced topics
In addition, there are two examples and a summary of the yacc input syntax.
Basic specifications
Names refer to either tokens or nonterminal symbols. yacc requires token names to be declared as such. While
the lexical analyzer may be included as part of the specification file, it is perhaps more in keeping with
modular design to keep it as a separate file. Like the lexical analyzer, other subroutines may be included as
well. Thus, every specification file theoretically consists of three sections: the declarations, (grammar) rules,
and subroutines. The sections are separated by double percent signs (%%; the percent sign is generally used
Basic specifications
115
when all sections are used. The declarations and subroutines sections are optional. The smallest valid yacc
specification might be
%%
S:;
Blanks, tabs, and newlines are ignored, but they may not appear in names or multicharacter reserved
symbols. Comments may appear wherever a name is valid. They are enclosed in / and /, as in the C
language.
The rules section is made up of one or more grammar rules. A grammar rule has the form
A
BODY
where A represents a nonterminal symbol, and BODY represents a sequence of zero or more names and
literals. The colon and the semicolon are yacc punctuation.
Names may be of any length and may be made up of letters, periods, underscores, and digits although a digit
may not be the first character of a name. Uppercase and lowercase letters are distinct. The names used in the
body of a grammar rule may represent tokens or nonterminal symbols.
A literal consists of a character enclosed in single quotes. As in the C language, the backslash is an escape
character within literals. yacc recognizes all the C language escape sequences. For a number of technical
reasons, the null character should never be used in grammar rules.
If there are several grammar rules with the same lefthand side, the vertical bar can be used to avoid rewriting
the lefthand side. In addition, the semicolon at the end of a rule is dropped before a vertical bar. Thus the
grammar rules
A
A
A
:
:
:
B
E
G
C
F
D
;
:
|
|
;
B
E
G
C
F
by using the vertical bar. It is not necessary that all grammar rules with the same left side appear together in
the grammar rules section although it makes the input more readable and easier to change.
Basic specifications
116
The blank space following the colon is understood by yacc to be a nonterminal symbol named ``epsilon''.
Names representing tokens must be declared. This is most simply done by writing
%token
name1
name2
name3
and so on in the declarations section. Every name not defined in the declarations section is assumed to
represent a nonterminal symbol. Every nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, the start symbol has particular importance. By default, the symbol is taken to
be the lefthand side of the first grammar rule in the rules section. It is possible and desirable to declare the
start symbol explicitly in the declarations section using the %start keyword:
%start
symbol
The end of the input to the parser is signaled by a special token, called the endmarker. The endmarker is
represented by either a zero or a negative number. If the tokens up to but not including the endmarker form a
construct that matches the start symbol, the parser function returns to its caller after the endmarker is seen
and accepts the input. If the endmarker is seen in any other context, it is an error.
It is the job of the usersupplied lexical analyzer to return the endmarker when appropriate. Usually the
endmarker represents some reasonably obvious I/O status, such as end of file or end of record.
Actions
With each grammar rule, you can associate actions to be performed when the rule is recognized. Actions may
return values and may obtain the values returned by previous actions. Moreover, the lexical analyzer can
return values for tokens if desired.
An action is an arbitrary C language statement and as such can do input and output, call subroutines, and alter
arrays and variables. An action is specified by one or more statements enclosed in { and }. For example,
A
:
{
'('
')'
hello( 1, "abc" );
}
and
XXX
:
{
YYY
ZZZ
117
$$ = 1;
then $2 has the value returned by C, and $3 the value returned by D. The rule
expr
'('
expr
')'
provides a common example. One would expect the value returned by this rule to be the value of the expr
within the parentheses. Since the first component of the action is the literal left parenthesis, the desired logical
result can be indicated by
expr
:
{
'('
expr
')'
$$ = $2 ;
}
By default, the value of a rule is the value of the first element in it ($1). Thus, grammar rules of the form
A
frequently need not have an explicit action. In previous examples, all the actions came at the end of rules.
Sometimes, it is desirable to get control before a rule is fully parsed. yacc permits an action to be written in
the middle of a rule as well as at the end. This action is assumed to return a value accessible through the usual
$ mechanism by the actions to the right of it. In turn, it may access the values returned by the symbols to its
left. Thus, in the rule below the effect is to set x to 1 and y to the value returned by C:
A
B
{
$$ = 1;
}
C
{
x = $2;
y = $3;
}
;
Actions that do not terminate a rule are handled by yacc by manufacturing a new nonterminal symbol name
and a new rule matching this name to the empty string. The interior action is the action triggered by
recognizing this added rule. yacc treats the above example as if it had been written
$ACT
:
{
empty
$$ = 1;
}
;
118
creates a node with label L and descendants n1 and n2 and returns the index of the newly created node. Then
a parse tree can be built by supplying actions such as
expr
:
{
expr
'+'
expr
in the specification.
You may define other variables to be used by the actions. Declarations and definitions can appear in the
declarations section enclosed in %{ and %}. These declarations and definitions have global scope, so they are
known to the action statements and can be made known to the lexical analyzer. For example:
%{
int variable = 0;
%}
could be placed in the declarations section making variable accessible to all of the actions. You should avoid
names beginning with yy because the yacc parser uses only such names. Note, too, that in the examples
shown thus far all the values are integers. A discussion of values of other types may be found in ``Advanced
topics''. Finally, note that in the following case
%{
int i;
printf("%}");
%}
yacc will start copying after %{ and stop copying when it encounters the first %}, the one in printf(). In
contrast, it would copy %{ in printf() if it encountered it there.
Lexical analysis
You must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired)
to the parser. The lexical analyzer is an integervalued function called yylex(). The function returns an
integer, the token number, representing the kind of token read. If there is a value associated with that token, it
should be assigned to the external variable yylval.
The parser and the lexical analyzer must agree on these token numbers in order for communication between
them to take place. The numbers may be chosen by yacc or the user. In either case, the #define mechanism of
C language is used to allow the lexical analyzer to return these numbers symbolically. For example, suppose
that the token name DIGIT has been defined in the declarations section of the yacc specification file. The
relevant portion of the lexical analyzer might look like
int yylex()
{
extern int yylval;
int c;
Lexical analysis
119
Parser operation
Parser operation
120
shift 34
which says, in state 56, if the lookahead token is IF, the current state (56) is pushed down on the stack, and
state 34 becomes the current state (on the top of the stack). The lookahead token is cleared.
The reduce action keeps the stack from growing without bounds. reduce actions are appropriate when the
parser has seen the righthand side of a grammar rule and is prepared to announce that it has seen an instance
of the rule replacing the righthand side by the lefthand side. It may be necessary to consult the lookahead
token to decide whether or not to reduce. In fact, the default action (represented by .) is often a reduce action.
reduce actions are associated with individual grammar rules. Grammar rules are also given small integer
numbers, and this leads to some confusion. The action
.
reduce 18
shift 34
is being reduced. The reduce action depends on the lefthand symbol (A in this case) and the number of
symbols on the righthand side (three in this case). To reduce, first pop off the top three states from the stack.
(In general, the number of states popped equals the number of symbols on the right side of the rule.) In effect,
these states were the ones put on the stack while recognizing x, y, and z and no longer serve any useful
Parser operation
121
goto 20
causing state 20 to be pushed onto the stack and become the current state.
In effect, the reduce action turns back the clock in the parse, popping the states off the stack to go back to the
state where the righthand side of the rule was first seen. The parser then behaves as if it had seen the left side
at that time. If the righthand side of the rule is empty, no states are popped off the stacks. The uncovered
state is in fact the current state.
The reduce action is also important in the treatment of usersupplied actions and values. When a rule is
reduced, the code supplied with the rule is executed before the stack is adjusted. In addition to the stack
holding the states, another stack running in parallel with it holds the values returned from the lexical analyzer
and the actions. When a shift takes place, the external variable yylval is copied onto the value stack. After the
return from the user code, the reduction is carried out. When the goto action is done, the external variable
yyval is copied onto the value stack. The pseudovariables $1, $2, and so on refer to the value stack.
The other two parser actions are conceptually much simpler. The accept action indicates that the entire input
has been seen and that it matches the specification. This action appears only when the lookahead token is the
endmarker and indicates that the parser has successfully done its job. The error action, on the other hand,
represents a place where the parser can no longer continue parsing according to the specification. The input
tokens it has seen (together with the lookahead token) cannot be followed by anything that would result in a
valid input. The parser reports an error and attempts to recover the situation and resume parsing. The error
recovery (as opposed to the detection of error) will be discussed later.
Consider
%token
%%
rhyme
sound
place
DING
:
;
:
;
:
;
DONG
sound
DING
DELL
place
DONG
DELL
as a yacc specification. When yacc is invoked with the v (verbose) option, a file called y.output is produced
with a humanreadable description of the parser. The y.output file corresponding to the above grammar (with
some statistics stripped off the end) follows.
state 0
$accept
_rhyme
$end
Parser operation
122
DONG
DELL
can be used to track the operations of the parser. Initially, the current state is state 0. The parser needs to refer
to the input in order to decide between the actions available in state 0, so the first token, DING, is read and
becomes the lookahead token. The action in state 0 on DING is shift 3, state 3 is pushed onto the stack, and
the lookahead token is cleared. State 3 becomes the current state. The next token, DONG, is read and
becomes the lookahead token. The action in state 3 on the token DONG is shift 6, state 6 is pushed onto the
stack, and the lookahead is cleared. The stack now contains 0, 3, and 6. In state 6, without even consulting the
lookahead, the parser reduces by
sound
DING
DONG
which is rule 2. Two states, 6 and 3, are popped off the stack, uncovering state 0. Consulting the description of
state 0 (looking for a goto on sound),
sound
goto 2
is obtained. State 2 is pushed onto the stack and becomes the current state.
In state 2, the next token, DELL, must be read. The action is shift 5, so state 5 is pushed onto the stack, which
now has 0, 2, and 5 on it, and the lookahead token is cleared. In state 5, the only action is to reduce by rule 3.
This has one symbol on the righthand side, so one state, 5, is popped off, and state 2 is uncovered. The goto
in state 2 on place (the left side of rule 3) is state 4. Now, the stack contains 0, 2, and 4. In state 4, the only
Parser operation
123
expr
''
expr
is a natural way of expressing the fact that one way of forming an arithmetic expression is to put two other
expressions together with a minus sign between them. Unfortunately, this grammar rule does not completely
specify the way that all complex inputs should be structured. For example, if the input is
expr
expr
expr
expr
expr
expr
expr
expr
or as
expr
expr
expr
consider the problem that confronts the parser. When the parser has read the second expr, the input seen
expr
expr
matches the right side of the grammar rule above. The parser could reduce the input by applying this rule.
After applying the rule, the input is reduced to expr (the left side of the rule). The parser would then read the
final part of the input
expr
and again reduce. The effect of this is to take the left associative interpretation.
Alternatively, if the parser sees
expr
expr
124
expr
expr
is seen. It could then apply the rule to the rightmost three symbols, reducing them to expr, which results in
expr
expr
being left. Now the rule can be reduced once more. The effect is to take the right associative interpretation.
Thus, having read
expr
expr
the parser can do one of two valid things, shift or reduce. It has no way of deciding between them. This is
called a shiftreduce conflict. It may also happen that the parser has a choice of two valid reductions. This is
called a reducereduce conflict. Note that there are never any shiftshift conflicts.
When there are shiftreduce or reducereduce conflicts, yacc still produces a parser. It does this by
selecting one of the valid steps wherever it has a choice. A rule describing the choice to make in a given
situation is called a disambiguating rule.
yacc invokes two default disambiguating rules:
1. In a shiftreduce conflict, the default is to do the shift.
2. In a reducereduce conflict, the default is to reduce by the earlier grammar rule (in the yacc
specification).
Rule 1 implies that reductions are deferred in favor of shifts when there is a choice. Rule 2 gives the user
rather crude control over the behavior of the parser in this situation, but reducereduce conflicts should be
avoided when possible.
Conflicts may arise because of mistakes in input or logic or because the grammar rules (while consistent)
require a more complex parser than yacc can construct. The use of actions within rules can also cause
conflicts if the action must be done before the parser can be sure which rule is being recognized. In these
cases, the application of disambiguating rules is inappropriate and leads to an incorrect parser. For this reason,
yacc always reports the number of shiftreduce and reducereduce conflicts resolved by rules 1 and 2
above.
In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also possible
to rewrite the grammar rules so that the same inputs are read but there are no conflicts. For this reason, most
previous parser generators have considered conflicts to be fatal errors. Our experience has suggested that this
rewriting is somewhat unnatural and produces slower parsers. Thus, yacc will produce parsers even in the
presence of conflicts.
As an example of the power of disambiguating rules, consider
stat
:
|
;
IF
IF
'('
'('
cond
cond
')'
')'
stat
stat
ELSE
stat
which is a fragment from a programming language involving an ifthenelse statement. In these rules, IF and
ELSE are tokens, cond is a nonterminal symbol describing conditional (logical) expressions, and stat is a
Ambiguity and conflicts
125
C1
IF
C2
S1
ELSE
S2
( C1 )
IF
( C2 )
S1
}
ELSE
S2
or
IF
{
( C1 )
IF
( C2 )
S1
ELSE
S2
}
where the second interpretation is the one given in most programming languages having this construct; each
ELSE is associated with the last preceding unELSE'd IF. In this example, consider the situation where the
parser has seen
IF
C1
IF
C2
S1
and is looking at the ELSE. It can immediately reduce by the simple if rule to get
IF
C1
stat
S2
and reduce
IF
C1
stat
ELSE
S2
by the ifelse rule. This leads to the first of the above groupings of the input.
On the other hand, the ELSE may be shifted, S2 read, and then the righthand portion of
IF
C1
IF
C2
S1
ELSE
S2
C1
stat
126
C1
IF
C2
S1
have already been seen. In general, there may be many conflicts, and each one will be associated with an input
symbol and a set of previously read inputs. The previously read inputs are characterized by the state of the
parser.
The conflict messages of yacc are best understood by examining the v output. For example, the output
corresponding to the above conflict state might be
23: shiftreduce conflict (shift 45, reduce 18) on ELSE
state 23
stat : IF ( cond ) stat_ (18) stat : IF ( cond ) stat_ELSE stat
ELSE shift 45 . reduce 18 where the first line describes the conflict giving the state and the input symbol.
The ordinary state description gives the grammar rules active in the state and the parser actions. Recall that
the underscore marks the portion of the grammar rules that has been seen. Thus in the example, in state 23, the
parser has seen input corresponding to
IF
cond
stat
and the two grammar rules shown are active at this time. The parser can do two possible things. If the input
symbol is ELSE, it is possible to shift into state 45. State 45 will have, as part of its description, the line
stat
IF
cond
stat
ELSE_stat
because the ELSE will have been shifted in this state. In state 23, the alternative action (specified by .) is to be
done if the input symbol is not mentioned explicitly in the actions. In this case, if the input symbol is not
ELSE, the parser reduces to
stat
IF
'('
cond
')'
stat
127
Precedence
There is one common situation where the rules given above for resolving conflicts are not sufficient. This is in
the parsing of arithmetic expressions. Most of the commonly used constructions for arithmetic expressions
can be naturally described by the notion of precedence levels for operators, together with information about
left or right associativity. It turns out that ambiguous grammars with appropriate disambiguating rules can be
used to create parsers that are faster and easier to write than parsers constructed from unambiguous grammars.
The basic notion is to write grammar rules of the form
expr
expr
expr
UNARY
OP
expr
and
expr
for all binary and unary operators desired. This creates a very ambiguous grammar with many parsing
conflicts. You specify as disambiguating rules the precedence or binding strength of all the operators and the
associativity of the binary operators. This information is sufficient to allow yacc to resolve the parsing
conflicts in accordance with these rules and construct a parser that realizes the desired precedences and
associativities.
The precedences and associativities are attached to tokens in the declarations section. This is done by a series
of lines beginning with the yacc keywords %left, %right, or %nonassoc, followed by a list of tokens. All of
the tokens on the same line are assumed to have the same precedence level and associativity; the lines are
listed in order of increasing precedence or binding strength. Thus
%left
%left
'+'
' '
''
'/'
describes the precedence and associativity of the four arithmetic operators. + and are left associative and
have lower precedence than and /, which are also left associative. The keyword %right is used to describe
right associative operators. The keyword %nonassoc is used to describe operators, like the operator .LT. in
FORTRAN, that may not associate with themselves. That is, because
A .LT. B .LT. C
is invalid in FORTRAN, .LT. would be described with the keyword %nonassoc in yacc.
As an example of the behavior of these declarations, the description
%right '='
%left '+' ''
%left ' ' '/'
%%
expr : expr '=' expr | expr '+' expr | expr '' expr | expr ' ' expr | expr '/' expr | NAME ; might be used to
structure the input
a
as follows
Precedence
128
d) e) (f
g) ) )
in order to achieve the correct precedence of operators. When this mechanism is used, unary operators must,
in general, be given a precedence. Sometimes a unary operator and a binary operator have the same symbolic
representation but different precedences. An example is unary and binary minus.
Unary minus may be given the same strength as multiplication, or even higher, while binary minus has a
lower strength than multiplication. The keyword %prec changes the precedence level associated with a
particular grammar rule. %prec appears immediately after the body of the grammar rule, before the action or
closing semicolon, and is followed by a token name or literal. It causes the precedence of the grammar rule to
become that of the following token name or literal. For example, the rules
%left
%left
'+'
' '
''
'/'
%%
expr : expr '+' expr | expr '' expr | expr ' ' expr | expr '/' expr | '' expr %prec ' ' | NAME ; might be used to
give unary minus the same precedence as multiplication.
A token declared by %left, %right, and %nonassoc need not, but may, be declared by %token as well.
Precedences and associativities are used by yacc to resolve parsing conflicts. They give rise to the following
disambiguating rules:
1. Precedences and associativities are recorded for those tokens and literals that have them.
2. A precedence and associativity is associated with each grammar rule. It is the precedence and
associativity of the last token or literal in the body of the rule. If the %prec construction is used, it
overrides this default. Some grammar rules may have no precedence and associativity associated with
them.
3. When there is a reducereduce or shiftreduce conflict, and either the input symbol or the grammar
rule has no precedence and associativity, then the two default disambiguating rules given in the
preceding section are used, and the conflicts are reported.
4. If there is a shiftreduce conflict and both the grammar rule and the input character have precedence
and associativity associated with them, then the conflict is resolved in favor of the action shift or
reduce associated with the higher precedence. If precedences are equal, then associativity is used.
Left associative implies reduce; right associative implies shift; nonassociating implies error.
Conflicts resolved by precedence are not counted in the number of shiftreduce and reducereduce conflicts
reported by yacc. This means that mistakes in the specification of precedences may disguise errors in the input
grammar. It is a good idea to be sparing with precedences and use them in a cookbook fashion until some
experience has been gained. The y.output file is useful in deciding whether the parser is actually doing what
was intended.
To illustrate further how you might use the precedence keywords to resolve a shiftreduce conflict, we will
look at an example similar to the one described in the previous section. Consider the following C statement:
if (flag) if (anotherflag) x = 1;
else x = 2;
The problem for the parser is whether the else goes with the first or the second if. C programmers will
recognize that the else goes with the second if, contrary to to what the misleading indentation suggests. The
Precedence
129
When the specification is passed to yacc, however, we get the following message:
conflicts: 1 shift/reduce
The problem is that when yacc has read iis in trying to match iises, it has two choices: recognize is as a
statement (reduce), or read some more input (shift) and eventually recognize ises as a statement.
One way to resolve the problem is to invent a new token REDUCE whose sole purpose is to give the correct
precedence for the rules:
%{
#include <stdio.h>
%}
%token SIMPLE IF
%nonassoc REDUCE
%nonassoc ELSE
%%
S
: stmnt '\n'
;
stmnt
: SIMPLE
| if_stmnt
;
if_stmnt
: IF stmnt %prec REDUCE
{ printf("simple if");}
| IF stmnt ELSE stmnt
{ printf("if_then_else");}
;
Precedence
130
Since the precedence associated with the second form of if_stmnt is higher now, yacc will try to match that
rule first, and no conflict will be reported.
Actually, in this simple case, the new token is not needed:
%nonassoc IF
%nonassoc ELSE
would also work. Moreover, it is not really necessary to resolve the conflict in this way, because, as we have
seen, yacc will shift by default in a shiftreduce conflict. Resolving conflicts is a good idea, though, in the
sense that you should not see diagnostic messages for correct specifications.
Error handling
Error handling is an extremely difficult area, and many of the problems are semantic ones. When an error is
found, for example, it may be necessary to reclaim parse tree storage, delete or alter symbol table entries,
and/or, typically, set switches to avoid generating any further output.
It is seldom acceptable to stop all processing when an error is found. It is more useful to continue scanning the
input to find further syntax errors. This leads to the problem of getting the parser restarted after an error. A
general class of algorithms to do this involves discarding a number of tokens from the input string and
attempting to adjust the parser so that input can continue.
To allow the user some control over this process, yacc provides the token name error. This name can be used
in grammar rules. In effect, it suggests where errors are expected and recovery might take place. The parser
pops its stack until it enters a state where the token error is valid. It then behaves as if the token error were
the current lookahead token and performs the action encountered. The lookahead token is then reset to the
token that caused the error. If no special error rules have been specified, the processing halts when an error is
detected.
In order to prevent a cascade of error messages, the parser, after detecting an error, remains in error state until
three tokens have been successfully read and shifted. If an error is detected when the parser is already in error
state, no message is given, and the input token is quietly deleted.
As an example, a rule of the form
stat
error
means that on a syntax error the parser attempts to skip over the statement in which the error is seen. More
precisely, the parser scans ahead, looking for three tokens that might validly follow a statement, and starts
processing at the first of these. If the beginnings of statements are not sufficiently distinctive, it may make a
false start in the middle of a statement and end up reporting a second error where there is in fact no error.
Actions may be used with these special error rules. These actions might attempt to reinitialize tables, reclaim
symbol table space, and so forth.
Error rules such as the above are very general but difficult to control. Rules such as
Error handling
131
error
';'
are somewhat easier. Here, when there is an error, the parser attempts to skip over the statement but does so
by skipping to the next semicolon. All tokens after the error and before the next semicolon cannot be shifted
and are discarded. When the semicolon is seen, this rule will be reduced and any cleanup action associated
with it performed.
Another form of error rule arises in interactive applications where it may be desirable to permit a line to be
reentered after an error. The following example
input : error
'\n'
{
(void) printf("Reenter last line: " );
}
input
{
$$ = $4;
}
;
is one way to do this. There is one potential difficulty with this approach. The parser must correctly process
three input tokens before it admits that it has correctly resynchronized after the error. If the reentered line
contains an error in the first two tokens, the parser deletes the offending tokens and gives no message. This is
clearly unacceptable. For this reason, there is a mechanism that can force the parser to believe that error
recovery has been accomplished. The statement
yyerrok ;
in an action resets the parser to its normal mode. The last example can be rewritten as
input : error
'\n'
{
yyerrok;
(void) printf("Reenter last line: " );
}
input
{
$$ = $4;
}
;
As previously mentioned, the token seen immediately after the error symbol is the input token at which the
error was discovered. Sometimes this is inappropriate; for example, an error recovery action might take upon
itself the job of finding
the correct place to resume input. In this case, the previous lookahead token must be cleared. The statement
yyclearin ;
in an action will have this effect. For example, suppose the action after error were to call some sophisticated
resynchronization routine (supplied by the user) that attempted to advance the input to the beginning of the
next valid statement. After this routine is called, the next token returned by yylex() is presumably the first
token in a valid statement. The old invalid token must be discarded and the error state reset. A rule similar to
stat
:
{
Error handling
error
132
where grammar.y is the file containing your yacc specification. (The .y suffix is a convention recognized by
other SCO OpenServer system commands. It is not strictly necessary.) The output is a file of C language
subroutines called y.tab.c. The function produced by yacc is called yyparse(), and is integervalued. When it
is called, it in turn repeatedly calls yylex(), the lexical analyzer supplied by the user (see ``Lexical analysis''),
to obtain input tokens. Eventually, an error is detected, yyparse() returns the value 1, and no error recovery is
possible, or the lexical analyzer returns the endmarker token and the parser accepts. In this case, yyparse()
returns the value 0.
You must provide a certain amount of environment for this parser in order to obtain a working program. For
example, as with every C language program, a routine called main() must be defined that eventually calls
yyparse(). In addition, a routine called yyerror() is needed to print a message when a syntax error is detected.
These two routines must be supplied in one form or another by the user. To ease the initial effort of using
yacc, a library has been provided with default versions of main() and yyerror(). The library, liby, is accessed
by a ly argument to the cc command. The source codes
main()
{
return (yyparse());
}
and
# include <stdio.h>
yyerror(s) char s; { (void) fprintf(stderr, "%s\n", s); } show the triviality of these default programs. The
argument to yyerror() is a string containing an error message, usually the string syntax error. The average
application wants to do better than this. Ordinarily, the program should keep track of the input line number
and print it along with the message when a syntax error is detected. The external integer variable yychar
contains the lookahead token number at the time the error was detected. This may be of some interest in
giving better diagnostics. Since the main() routine is probably supplied by the user (to read arguments, for
instance), the yacc library is useful only in small projects or in the earliest stages of larger ones.
133
Input style
It is difficult to provide rules with substantial actions and still have a readable specification file. The following
are a few style hints.
1. Use all uppercase letters for token names and all lowercase letters for nonterminal names. This is
useful in debugging.
2. Put grammar rules and actions on separate lines. It makes editing easier.
3. Put all rules with the same lefthand side together. Put the lefthand side in only once and let all
following rules begin with a vertical bar.
4. Put a semicolon only after the last rule with a given lefthand side and put the semicolon on a
separate line. This allows new rules to be easily added.
5. Indent rule bodies by one tab stop and action bodies by two tab stops.
6. Put complicated actions into subroutines defined in separate files.
``A simple example'' is written following this style, as are the examples in this section (where space permits).
The central problem is to make the rules visible through the morass of action code.
Left recursion
The algorithm used by the yacc parser encourages so called left recursive grammar rules. Rules of the form
name
name
rest_of_rule
:
|
;
item
list
:
|
;
item
seq item
','
item
and
seq
frequently arise when writing specifications of sequences and lists. In each of these cases, the first rule will be
reduced for the first item only; and the second rule will be reduced for the second and all succeeding items.
With right recursive rules, such as
seq
item
134
item
seq
the parser is a bit bigger; and the items are seen and reduced from right to left. More seriously, an internal
stack in the parser is in danger of overflowing if an extremely long sequence is read (although yacc can
process very large stacks). Thus, you should use left recursion wherever reasonable.
It is worth considering if a sequence with zero elements has any meaning, and if so, consider writing the
sequence specification as
seq
:
|
;
/ empty /
seq item
using an empty rule. Once again, the first rule would always be reduced exactly once before the first item was
read, and then the second rule would be reduced once for each item read. Permitting empty sequences often
leads to increased generality. However, conflicts might arise if yacc is asked to decide which empty sequence
it has seen when it has not seen enough to know!
Lexical tieins
Some lexical decisions depend on context. For example, the lexical analyzer might want to delete blanks
normally, but not within quoted strings, or names might be entered into a symbol table in declarations but not
in expressions. One way of handling these situations is to create a global flag that is examined by the lexical
analyzer and set by actions. For example,
%{
int dflag;
%}
...
%%
prog : decls stats ;
decls : / empty / { dflag = 1; } | decls declaration ;
stats : / empty / { dflag = 0; } | stats statement ;
other rules specifies a program that consists of zero or more declarations followed by zero or more
statements. The flag dflag is now 0 when reading statements and 1 when reading declarations, except for the
first token in the first statement. This token must be seen by the parser before it can tell that the declaration
section has ended and the statements have begun. In many cases, this single token exception does not affect
the lexical scan.
This kind of backdoor approach can be elaborated to a noxious degree. Nevertheless, it represents a way of
doing some things that are difficult, if not impossible, to do otherwise.
Lexical tieins
135
Reserved words
Some programming languages permit you to use words like if, which are normally reserved as label or
variable names, provided that such use does not conflict with the valid use of these names in the programming
language. This is extremely hard to do in the framework of yacc. It is difficult to pass information to the
lexical analyzer telling it this instance of if is a keyword and that instance is a variable. You can make a stab
at it using the mechanism described in the last subsection, but it is difficult.
Advanced topics
The following sections discuss a number of advanced features of yacc.
:
{
adj
noun
verb
adj
noun
adj
}
;
:
{
THE
$$ = THE;
}
|
{
noun
YOUNG
$$ = YOUNG;
}
...
;
:
DOG
{
$$ = DOG;
}
|
CRONE
{
if( $0 == YOUNG )
{
(void) printf( "what?\n" );
}
$$ = CRONE;
}
;
...
Reserved words
136
in the declaration section. This declares the yacc value stack and the external variables yylval and yyval to
have type equal to this union. If yacc was invoked with the d option, the union declaration is copied into the
y.tab.h file as YYSTYPE.
Once YYSTYPE is defined, the union member names must be associated with the various terminal and
nonterminal names. The construction
<name>
is used to indicate a union member name. If this follows one of the keywords %token, %left, %right, and
%nonassoc, the union member name is associated with the tokens listed. Thus, saying
%left
<optype>
'+'
''
causes any reference to values returned by these two tokens to be tagged with the union member name
optype. Another keyword, %type, is used to associate union member names with nonterminals. Thus, one
might say
%type
<nodetype>
expr
stat
to associate the union member nodetype with the nonterminal symbols expr and stat.
137
aaa
{
$<intval>$ = 3;
}
bbb
{
fun( $<intval>2, $<other>0 );
}
;
shows this usage. This syntax has little to recommend it, but the situation arises rarely.
A sample specification is given in ``An advanced example''. The facilities in this subsection are not triggered
until they are used. In particular, the use of %type will turn on these mechanisms. When they are used, there
is a fairly strict level of checking. For example, use of $n or $$ to refer to something with no defined type is
diagnosed. If these facilities are not triggered, the yacc value stack is used to hold ints.
/ basic entries / %token IDENTIFIER / includes identifiers and literals / %token C_IDENTIFIER /
identifier (but not literal) followed by a : / %token NUMBER / [09]+ /
/ reserved words: %type=>TYPE %left=>LEFT,etc. /
%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
%token MARK / the %% mark / %token LCURL / the %{ mark / %token RCURL / the %} mark /
/ ASCII character literals stand for themselves /
%token spec
%%
138
A simple example
This example gives the complete yacc applications for a small desk calculator; the calculator has 26 registers
labeled a through z and accepts arithmetic expressions made up of the operators +, , *, /, %,&, |, and the
assignment operators.
If an expression at the top level is an assignment, only the assignment is done; otherwise, the expression is
printed. As in the C language, an integer that begins with 0 is assumed to be octal; otherwise, it is assumed to
be decimal.
As an example of a yacc specification, the desk calculator does a reasonable job of showing how precedence
and ambiguities are used and demonstrates simple recovery. The major oversimplifications are that the lexical
analyzer is much simpler than for most applications, and the output is produced immediately line by line. Note
the way that decimal and octal integers are read in by grammar rules. This job is probably better done by the
lexical analyzer.
%{
# include <stdio.h>
# include <ctype.h>
139
An advanced example
This section gives an example of a grammar using some of the advanced features. The desk calculator in ``A
simple example'' is modified to provide a desk calculator that does floating point interval arithmetic. The
calculator understands floating point constants, and the arithmetic operations +, , , /, and unary . It uses the
registers a through z. Moreover, it understands intervals written
(X,Y)
where X is less than or equal to Y. There are 26 interval valued variables A through Z that may also be used.
The usage is similar to that in ``A simple example''; assignments return no value and print nothing while
expressions print the (floating or interval) value.
This example explores a number of interesting features of yacc and C. Intervals are represented by a structure
consisting of the left and right endpoint values stored as doubles. This structure is given a type name,
INTERVAL, by using typedef. The yacc value stack can also contain floating point scalars and integers
(used to index into the arrays holding the variable values). Notice that the entire strategy depends strongly on
being able to assign structures and unions in C language. In fact, many of the actions call functions that return
structures as well.
It is also worth noting the use of YYERROR to handle error conditions division by an interval containing
0 and an interval presented in the wrong order. The error recovery mechanism of yacc is used to throw away
the rest of the offending line.
An advanced example
140
and
2.5 + (3.5, 4)
Notice that the 2.5 is to be used in an interval value expression in the second example, but this fact is not
known until the comma is read. By this time, 2.5 is finished, and the parser cannot go back and change its
mind. More generally, it might be necessary to look ahead an arbitrary number of tokens to decide whether to
convert a scalar to an interval. This problem is evaded by having two rules for each binary interval valued
operator one when the left operand is a scalar and one when the left operand is an interval. In the second
case, the right operand must be an interval, so the conversion will be applied automatically. Despite this
evasion, there are still many cases where the conversion may be applied or not, leading to the above conflicts.
They are resolved by listing the rules that yield scalars first in the specification file; in this way, the conflict
will be resolved in the direction of keeping scalar valued expressions scalar valued until they are forced to
become intervals.
This way of handling multiple types is instructive. If there were many kinds of expression types instead of just
two, the number of rules needed would increase dramatically and the conflicts even more dramatically. Thus,
it is better practice in a more normal programming language environment to keep the type information as part
of the value and not as part of the grammar.
Finally, a word about the lexical analysis. The only unusual feature is the treatment of floating point
constants. The C language library routine atof() is used to do the actual conversion from a character string to a
doubleprecision value. If the lexical analyzer detects an error, it responds by returning a token that is invalid
in the grammar, provoking a syntax error in the parser and thence error recovery.
%{
141
An advanced example
142
An advanced example
143
Basic features
The basic operation of make is to update a target file by ensuring that all of the files on which the target file
depends exist and are up to date. The target file is regenerated if it has not been modified since the dependents
were modified. The make program builds and searches a graph of these dependencies. The operation of make
depends on its ability to find the date and time that a file was last modified.
The make program operates using three sources of information:
a usersupplied description file
file names and lastmodified times from the file system
builtin rules supply default dependency information and implied commands
To illustrate, consider a simple example in which a program named prog is made by compiling and loading
three C language files x.c, y.c, and z.c with the math library, libm. By convention, the output of the C
Managing file interactions with make
144
lm
prog
x.o y.o : defs.h If this information were stored in a file named makefile, the command
$ make
would perform the operations needed to regenerate prog after any changes had been made to any of the four
source files x.c, y.c, z.c, or defs.h. In the example above, the first line states that prog depends on three .o
files. Once these object files are current, the second line describes how to combine them to create prog. The
third line states that x.o and y.o depend on the file defs.h. From the file system, make discovers that there are
three .c files corresponding to the needed .o files and uses builtin rules on how to generate an object from a
C source file (that is, issue a cc c command).
If make did not have the ability to determine automatically what needs to be done, the following longer
description file would be necessary:
prog :
x.o :
y.o :
z.o :
x.o
cc
x.c
cc
y.c
cc
z.c
cc
y.o z.o
x.o y.o z.o
defs.h
c x.c
defs.h
c y.c
c
lm
o prog
z.c
If none of the source or object files have changed since the last time prog was made, and all of the files are
current, the command make announces this fact and stops. If, however, the defs.h file has been edited, x.c and
y.c (but not z.c) are recompiled; and then prog is created from the new x.o and y.o files, and the existing z.o
file. If only the file y.c had changed, only it is recompiled; but it is still necessary to relink prog. If no target
name is given on the make command line, the first target mentioned in the description is created; otherwise,
the specified targets are made. The command
$ make x.o
145
o prog
The command
$ make
LIBES="ll lm"
loads the three objects with both the lex (ll) and the math (lm) libraries, because macro definitions on the
command line override definitions in the description file. (In SCO OpenServer system commands, arguments
with embedded blanks must somehow be quoted.)
As an example of the use of make, a description file that might be used to maintain the make command itself
is given. The code for make is spread over a number of C language source files and has a yacc grammar. The
description file contains the following:
# Description file for the make command
FILES = Makefile defs.h main.c doname.c misc.c \ files.c dosys.c gram.y OBJECTS = main.o doname.o
misc.o files.o \ dosys.o gram.o LIBES = LINT = lint p CFLAGS = O LP = lp
make: $(OBJECTS) $(CC) $(CFLAGS) o make $(OBJECTS) $(LIBES) @size make
$(OBJECTS): defs.h
cleanup: rm .o gram.c du
install: make @size make /usr/ccs/bin/make cp make /usr/ccs/bin/make && rm make
146
The last line results from the size make command. The printing of the command line itself was suppressed by
the symbol ``@'' in the description file.
Parallel make
If make is invoked with the P option, it tries to build more than one target at a time, in parallel. (This is done
by using the standard SCO OpenServer system process mechanism which enables multiple processes to run
simultaneously.)
prog :
x.o :
y.o :
z.o :
x.o
cc
x.c
cc
y.c
cc
z.c
cc
y.o z.o
x.o y.o z.o
defs.h
c x.c
defs.h
c y.c
c
lm
o prog
z.c
For the makefile shown above, it would create processes to build x.o, y.o and z.o in parallel. After these
processes were complete, it would build prog.
The number of targets make will try to build in parallel is determined by the value of the environment
variable PARALLEL. If P is invoked, but PARALLEL is not set, then make will try to build no more than
two targets in parallel.
You can use the .MUTEX directive to serialize the updating of some specified targets. This is useful when
two or more targets modify a common output file, such as when inserting modules into an archive or when
creating an intermediate file with the same name, as is done by lex and yacc.
If the makefile above contained a .MUTEX directive of the form
.MUTEX: x.o y.o
147
Comments
The comment convention is that the symbol ``#'' and all characters on the same line after it are ignored. Blank
lines and lines beginning with ``#'' are totally ignored.
Continuation lines
If a noncomment line is too long, the line can be continued by using the symbol ``\'', which must be the last
character on the line. If the last character of a line is ``\'', then it, the newline, and all following blanks and
tabs are replaced by a single blank. Comments can be continued on to the next line as well.
Macro definitions
A macro definition is an identifier followed by the symbol ``=''. The identifier must not be preceded by a
colon (``:'') or a tab. The name (string of letters and digits) to the left of the = (trailing blanks and tabs are
stripped) is assigned the string of characters following the = (leading blanks and tabs are stripped). The
following are valid macro definitions:
2 = xyz
abc = ll ly lm
LIBES =
The last definition assigns LIBES the null string. A macro that is never explicitly defined has the null string
as its value. Remember, however, that some macros are explicitly defined in make's own rules.
General form
The general form of an entry in a description file is
target1 [target2 ...] :[:] [dependent1 ...] [; commands] [# ...]
[ \t commands] [# ...]
. . .
Items inside brackets may be omitted and targets and dependents are strings of letters, digits, periods, and
slashes. Shell metacharacters such as `` '' and ``?'' are expanded when the commands are evaluated.
Commands may appear either after a semicolon on a dependency line or on lines beginning with a tab
(denoted above as ``\t'') immediately following a dependency line. A command is any string of characters not
including #, except when # is in quotes.
Dependency information
A dependency line may have either a single or a double colon. A target name may appear on more than one
dependency line, but all of those lines must be of the same (single or double colon) type. For the more
common single colon case, a command sequence may be associated with at most one dependency line. If the
target is out of date with any of the dependents on any of the lines and a command sequence is specified (even
a null one following a semicolon or tab), it is executed; otherwise, a default rule may be invoked. In the
Description files and substitutions
148
Executable commands
If a target must be created, the sequence of commands is executed. Normally, each command line is printed
and then passed to a separate invocation of the shell after substituting for macros. The printing is suppressed
in the silent mode (s option of the make command) or if the command line in the description file begins with
an ``@'' sign. make normally stops if any command signals an error by returning a nonzero error code. Errors
are ignored if the i flag has been specified on the make command line, if the fake target name .IGNORE
appears in the description file, or if the command string in the description file begins with a hyphen (). If a
program is known to return a meaningless status, a hyphen in front of the command that invokes it is
appropriate. Because each command line is passed to a separate invocation of the shell, care must be taken
with certain commands (cd and shell control commands, for instance) that have meaning only within a single
shell process. These results are forgotten before the next line is executed.
Before issuing any command, certain internally maintained macros are set. The $@ macro is set to the full
target name of the current target. The $@ macro is evaluated only for explicitly named dependencies. The $?
macro is set to the string of names that were found to be younger than the target. The $? macro is evaluated
when explicit rules from the makefile are evaluated. If the command was generated by an implicit rule, the $<
macro is the name of the related file that caused the action; and the $ macro is the prefix shared by the
current and the dependent file names. If a file must be made but there are no explicit commands or relevant
builtin rules, the commands associated with the name .DEFAULT are used. If there is no such name, make
prints a message and stops.
In addition, a description file may also use the following related macros: $(@D), $(@F), $( D), $( F), $(<D),
and $(<F) (see below).
Output translations
The values of macros are replaced when evaluated. The general form, where brackets indicate that the
enclosed sequence is optional, is as follows:
$(macro[:string1=[string2]])
The parentheses are optional if there is no substitution specification and the macro name is a single character.
If a substitution sequence is present, the value of the macro is considered to be a sequence of ``words''
separated by sequences of blanks, tabs, and newline characters. Then, for each such word that ends with
Executable commands
149
A dependency of the preceding form is necessary for each of the different types of source files (suffixes) that
define the archive library. These translations are added in an effort to make more general use of the wealth of
information that make generates.
Recursive makefiles
Another feature of make concerns the environment and recursive invocations. If the sequence $(MAKE)
appears anywhere in a shell command line, the line is executed even if the n flag is set. Since the n flag is
exported across invocations of make (through the MAKEFLAGS variable), the only thing that is executed is
the make command itself. This feature is useful when a hierarchy of makefiles describes a set of software
subsystems. For testing purposes, make n can be executed and everything that would have been done will be
printed including output from lowerlevel invocations of make.
Implicit rules
make uses a table of suffixes and a set of transformation rules to supply default dependency information and
implied commands. The default suffix list (in order) is as follows:
Recursive makefiles
150
151
will cause the newcc command to be used instead of the usual C language compiler. The macros CFLAGS,
YFLAGS, LFLAGS, ASFLAGS, FFLAGS, and C++FLAGS may be set to cause these commands to be
issued with optional flags. Thus
$ make CFLAGS=g
Recursive makefiles
152
Archive libraries
The make program has an interface to archive libraries. A user may name a member of a library in the
following manner:
projlib(object.o)
or
projlib((entry_pt))
where the second method actually refers to an entry point of an object file within the library. (make looks
through the library, locates the entry point, and translates it to the correct object file name.)
To use this procedure to maintain an archive library, the following type of makefile is required:
projlib::
projlib(pfile1.o)
$(CC) c $(CFLAGS) pfile1.c
$(AR) $(ARFLAGS) projlib pfile1.o
rm pfile1.o
projlib::
projlib(pfile2.o)
$(CC) c $(CFLAGS) pfile2.c
$(AR) $(ARFLAGS) projlib pfile2.o
rm pfile2.o
and so on for each object. This is tedious and error prone. Obviously, the command sequences for adding a C
language file to a library are the same for each invocation; the file name being the only difference each time.
(This is true in most cases.)
The make command also gives the user access to a rule for building libraries. The handle for the rule is the .a
suffix. Thus, a .c.a rule is the rule for compiling a C language source file, adding it to the library, and
removing the .o file. Similarly, the .y.a, the .s.a, and the .l.a rules rebuild yacc, assembler, and lex files,
respectively. The archive rules defined internally are .c.a, .c~.a, .f.a, .f~.a, and .s~.a. (The tilde (``~'') syntax
will be described shortly.) The user may define other needed rules in the description file.
The above twomember library is then maintained with the following shorter makefile:
projlib:
projlib(pfile1.o) projlib(pfile2.o)
@echo projlib uptodate.
The internal rules are already defined to complete the preceding library maintenance. The actual .c.a rule is as
follows:
.c.a:
$(CC) c $(CFLAGS) $<
$(AR) $(ARFLAGS) $@ $(<F:.c=.o)
rm f $(<F:.c=.o)
Thus, the $@ macro is the .a target (projlib); the $< and $ macros are set to the outofdate C language file,
and the file name minus the suffix, respectively (pfile1.c and pfile1). The $< macro (in the preceding rule)
could have been changed to $ .c.
It is useful to go into some detail about exactly what make does when it sees the construction
Archive libraries
153
Assume the object in the library is out of date with respect to pfile1.c. Also, there is no pfile1.o file.
1. make projlib.
2. Before makeing projlib, check each dependent of projlib.
3. projlib(pfile1.o) is a dependent of projlib and needs to be generated.
4. Before generating projlib(pfile1.o), check each dependent of projlib(pfile1.o). (There are none.)
5. Use internal rules to try to create projlib(pfile1.o). (There is no explicit rule.) Note that
projlib(pfile1.o) has a parenthesis in the name to identify the target suffix as .a. This is the key. There
is no explicit .a at the end of the projlib library name. The parenthesis implies the .a suffix. In this
sense, the .a is hardwired into make.
6. Break the name projlib(pfile1.o) up into projlib and pfile1.o. Define two macros, $@ (projlib) and
$ (pfile1).
7. Look for a rule .X.a and a file $ .X. The first .X (in the
.SUFFIXES list) which fulfills these conditions is .c so the rule is .c.a, and the file is pfile1.c. Set $<
to be pfile1.c and execute the rule. In fact, make must then compile pfile1.c.
8. The library has been updated. Execute the command associated with the projlib: dependency, namely
@echo projlib uptodate
It should be noted that to let pfile1.o have dependencies, the following syntax is required:
projlib(pfile1.o):
$(INCDIR)/stdio.h
pfile1.c
There is also a macro for referencing the archive member name when this form is used. The $% macro is
evaluated each time $@ is evaluated. If there is no current archive member, $% is null. If an archive member
exists, then $% evaluates to the expression between the parenthesis.
Thus, ~ appended to any suffix transforms the file search into an SCCS file name search with the actual suffix
named by the dot and all characters up to (but not including) ~.
The following SCCS suffixes are internally defined:
154
.c~:
.c~.c:
.c~.a:
.c~.o:
.y~.c:
.y~.o:
.y~.y:
.l~.c:
.l~.o:
.l~.l:
.s~:
.s~.s:
.s~.a:
.s~.o:
.sh~:
.sh~.sh:
.h~.h:
.f~:
.f~.f:
.f~.a:
.f~.o:
.C~:
.C~.C:
.C~.a:
.C~.o:
.Y~.C:
.Y~.o:
.Y~.Y:
.L~.C:
.L~.o:
.L~.L:
Obviously, the user can define other rules and suffixes that may prove useful. The ~ provides a handle on the
SCCS file name format so that this is possible.
In fact, this .c: rule is internally defined so no makefile is necessary at all. The user only needs to enter
$ make cat dd echo date
(these are all SCO OpenServer system singlefile programs) and all four C language source files are passed
through the above shell command line associated with the .c: rule. The internally defined single suffix rules
are
155
Included files
The make program has a capability similar to the #include directive of the C preprocessor. If the string
include appears as the first seven letters of a line in a makefile and is followed by a blank or a tab, the rest of
the line is assumed to be a file name, which the current invocation of make will read. Macros may be used in
file names. The file descriptors are stacked for reading include files so that no more than 16 levels of nested
includes are supported.
SCCS makefiles
Makefiles under SCCS control are accessible to make. That is, if make is typed and only a file named
s.makefile or s.Makefile exists, make will do a get on the file, then read and remove the file.
$$@.c
the dependency is translated at execution time to the string cat.c. This is useful for building a large number of
executable files, each of which has only one source file. For instance, the SCO OpenServer system software
command directory could have a makefile like:
CMDS = cat dd echo date cmp comm chown
$(CMDS): $$@.c $(CC) $(CFLAGS) $? o $@ Obviously, this is a subset of all the single file programs. For
multiple file programs, a directory is usually allocated and a separate makefile is made. For any particular file
that has a peculiar compilation procedure, a specific entry must be made in the makefile.
The second useful form of the dependency parameter is $$(@F). It represents the file name part of $$@.
Again, it is evaluated at execution time. Its usefulness becomes evident when trying to maintain the
/usr/include directory from
makefile in the /usr/src/head directory. Thus, the /usr/src/head/makefile would look like
INCDIR = /usr/include
156
i
Ignore error codes returned by invoked commands. This mode is entered if the fake target name
.IGNORE appears in the description file.
s
Silent mode. Do not print command lines before executing. This mode is also entered if the fake
target name .SILENT appears in the description file.
r
Do not use the builtin rules.
n
No execute mode. Print commands, but do not execute them. Even lines beginning with an ``@'' sign
are printed.
t
Touch the target files (causing them to be up to date) rather than issue the usual commands.
q
Question. The make command returns a zero or nonzero status code depending on whether the target
file is or is not up to date.
p
Print out the complete set of macro definitions and target descriptions.
k
Abandon work on the current entry if something goes wrong, but continue on other branches that do
not depend on the current entry.
e
Environment variables override assignments within makefiles.
f
Description file name. The next argument is assumed to be the name of a description file. A file name
of denotes the standard input. If there are no f arguments, the file named makefile, Makefile,
s.makefile, or s.Makefile in the current directory is read. The contents of the description files
override the builtin rules if they are present.
P
Update, in parallel, more than one target at a time. The number of targets updated concurrently is
determined by the environment variable PARALLEL and the presence of .MUTEX directives in
makefiles.
The following two fake target names are evaluated in the same manner as flags:
Included files
157
Environment variables
Environment variables are read and added to the macro definitions each time make executes. Precedence is a
prime consideration in doing this properly. The following describes make's interaction with the environment.
A macro, MAKEFLAGS, is maintained by make. The macro is defined as the collection of all input flag
arguments into a string (without minus signs). The macro is exported and thus accessible to recursive
invocations of make. Command line flags and assignments in the makefile update MAKEFLAGS. Thus, to
describe how the environment interacts with make, the MAKEFLAGS macro (environment variable) must
be considered.
When executed, make assigns macro definitions in the following order:
1. Read the MAKEFLAGS environment variable. If it is not present or null, the internal make variable
MAKEFLAGS is set to the null string. Otherwise, each letter in MAKEFLAGS is assumed to be an
input flag argument and is processed as such. (The only exceptions are the f, p, and r flags.)
2. Read the internal list of macro definitions.
3. Read the environment. The environment variables are treated as macro definitions and marked as
exported (in the shell sense).
4. Read the makefile(s). The assignments in the makefile(s) override the environment. This order is
chosen so that when a makefile is read and executed, you know what to expect. That is, you get what
is seen unless the e flag is used. The e is the input flag argument, which tells make to have the
environment override the makefile assignments. Thus, if make e is entered, the variables in the
environment override the definitions in the makefile. Also MAKEFLAGS overrides the environment
if assigned. This is useful for further invocations of make from the current makefile.
It may be clearer to list the precedence of assignments. Thus, in order from least binding to most binding, the
precedence of assignments is as follows:
1. internal definitions
2. environment
3. makefile(s)
4. command line
The e flag has the effect of rearranging the order to:
1. internal definitions
2. makefile(s)
3. environment
Environment variables
158
line, then the object file x.o depends on defs.h; the source file x.c does not. If defs.h is changed, nothing is
done to the file x.c while file x.o must be recreated.
To discover what make would do, the n option is very useful. The command
$ make n
orders make to print out the commands that make would issue without actually taking the time to execute
them. If a change to a file is absolutely certain to be mild in character (adding a comment to an include file,
for example), the t (touch) option can save a lot of time. Instead of issuing a large number of superfluous
recompilations, make updates the modification times on the affected file. Thus, the command
$ make ts
(touch silently) causes the relevant files to appear up to date. Obvious care is necessary because this mode of
operation subverts the intention of make and destroys all memory of the previous relationships.
Internal rules
The standard set of internal rules used by make are reproduced below.
#
#
#
.SUFFIXES: .o .c .c~ .y .y~ .l .l~ .s .s~ .sh .sh~ .h .h~ .f .f~ .C .C~ \ .Y .Y~ .L .L~
# # PREDEFINED MACROS #
AR=ar ARFLAGS=rv AS=as ASFLAGS= BUILD=build CC=cc CFLAGS=O C++C=CC C++FLAGS=O
F77=f77 FFLAGS=O GET=get GFLAGS= LEX=lex LFLAGS= LD=ld LDFLAGS= MAKE=make
YACC=yacc YFLAGS= # # SPECIAL RULES #
markfile.o : markfile A=@; echo "static char _sccsid[]=\042`grep $$A'(#)' markfile`\042;" \ > markfile.c
$(CC) c markfile.c rm f markfile.c # # SINGLE SUFFIX RULES #
.c: $(CC) $(CFLAGS) o $@ $< $(LDFLAGS)
.c~: $(GET) $(GFLAGS) $< $(CC) $(CFLAGS) o $@ $ .c $(LDFLAGS) rm f $ .c
159
160
161
Basic usage
Several terminal session fragments are presented in this section. Try them all. The best way to learn SCCS is
to use it.
Terminology
A delta is a set of changes made to a file under SCCS custody. To identify and keep track of a delta, it is
assigned an SID (SCCS IDentification) number. The SID for any original file turned over to SCCS is
composed of release number 1 and level number 1, stated as 1.1. The SID for the first set of changes made to
that file, that is, its first delta, is release 1 version 2, or 1.2. The next delta would be 1.3, the next 1.4, and so
on. (For more on delta numbering, see ``Delta numbering''.) At this point, it is enough to know that by default
SCCS assigns SIDs automatically.
Custody of your lang file can be given to SCCS using the admin (for administer) command. The following
creates an SCCS file from the lang file:
$ admin ilang s.lang
All SCCS files must have names that begin with s., hence s.lang. The i keyletter, together with its value
lang, means admin is to create an SCCS file and initialize it with the contents of the file lang.
Tracking versions with SCCS
162
This is a warning message that may also be issued by other SCCS commands. Ignore it for now. Its
significance is described in ``get''. In the following examples, this warning message is not shown although it
may be issued.
Remove the lang file. It is no longer needed because it exists now under SCCS as s.lang.
$ rm lang
This tells you that get retrieved version 1.1 of the file, which is made up of five lines of text.
The retrieved text is placed in a new file called lang. That is, if you list the contents of your directory, you will
see both lang and s.lang.
The get s.lang command creates lang, a file meant for viewing (readonly), not for making changes to. If you
want to make changes to it, the e (edit) option must be used. This is done as follows:
$ get e s.lang
get e causes SCCS to create lang for both reading and writing (editing). It also places certain information
about lang in another new file, called p.lang, which is needed later by the delta command. Now if you list the
contents of your directory, you will see s.lang, lang, and p.lang.
get e prints the same messages as get, except that the SID for the first delta you will create also is issued:
1.1
new delta 1.2
5 lines
163
Your response should be an explanation of why the changes were made. For example,
added more languages
delta now reads the file p.lang and determines what changes you made to lang. It does this by doing its own
get to retrieve the original version and applying the diff(C) command to the original version and the edited
version. Next, delta stores the changes in s.lang and destroys the no longer needed p.lang and lang files.
When this process is complete, delta outputs
1.2
2 inserted
0 deleted
5 unchanged
The number 1.2 is the SID of the delta you just created, and the next three lines summarize what was done to
s.lang.
More on get
The command
$ get s.lang
retrieves the latest version of the file s.lang, now 1.2. SCCS does this by starting with the original version of
the file and applying the delta you made. If you use the get command now, any of the following will retrieve
version 1.2:
$ get s.lang
$ get r1 s.lang
$ get r1.2 s.lang
The numbers following r are SIDs. When you omit the level number of the SID (as in get r1 s.lang), the
default is the highest level number that exists within the specified release. Thus, the second command requests
the retrieval of the latest version in release 1, namely 1.2. The third command requests the retrieval of a
particular version, in this case also 1.2.
Whenever a major change is made to a file, you may want to signify it by changing the release number, the
first number of the SID. This, too, is done with the get command:
$ get e r2 s.lang
164
which means version 1.2 has been retrieved, and 2.1 is the version the delta command will create. If the file is
now edited for example, by deleting COBOL from the list of languages and delta is executed
$ delta s.lang
comments? deleted cobol from list of languages
you will see by delta's output that version 2.1 is indeed created:
2.1
0 inserted
1 deleted
6 unchanged
Deltas can now be created in release 2 (deltas 2.2, 2.3, etc.), or another new release can be created in a similar
manner. A delta can still be made to the ``old'' release 1. This is explained in ``Delta numbering''.
The code co1 can be used with help to print a fuller explanation of the message:
$ help co1
This gives the following explanation of why get lang produced an error message:
co1:
"not an SCCS file"
A file that you think is an SCCS file
does not begin with the characters "s.".
help is useful whenever there is doubt about the meaning of almost any SCCS message.
Delta numbering
Think of deltas as the nodes of a tree in which the root node is the original version of the file. The root node
is normally named 1.1 and deltas (nodes) are named 1.2, 1.3, etc. The components of these SIDs are called
release and level numbers, respectively. Thus, normal naming of new deltas proceeds by incrementing the
level number. This is done automatically by SCCS whenever a delta is made.
165
The branch number of the first delta branching off any trunk delta is always 1, and its sequence number is also
1. For example, the full SID for a delta branching off trunk delta 1.3 will be 1.3.1.1. As other deltas on that
same branch are created, only the sequence number changes: 1.3.1.2, 1.3.1.3, etc. This is shown in ``Tree
structure with branch deltas''.
166
167
Error messages
SCCS commands produce error messages on the diagnostic output in this format:
ERROR [file]: message text (code)
The code in parentheses can be used as an argument to the help command to obtain a further explanation of
the message. Detection of a fatal error during the processing of a file causes the SCCS command to stop
processing that file and proceed with the next file specified.
SCCS commands
This section describes the major features of the fourteen SCCS commands and their most common arguments.
Here is a quickreference overview of the commands:
get(1)
retrieves versions of SCCS files.
unget(1)
undoes the effect of a get e prior to the file being deltaed.
delta(1)
applies deltas (changes) to SCCS files and creates new versions.
admin(1)
initializes SCCS files, manipulates their descriptive text, and controls delta creation rights.
prs(1)
prints portions of an SCCS file in userspecified format.
sact(1)
prints information about files that are currently out for editing.
help(1)
gives explanations of error messages.
rmdel(1)
removes a delta from an SCCS fileallows removal of deltas created by mistake.
cdc(1)
changes the commentary associated with a delta.
Error messages
168
get
The get command creates a file that contains a specified version of an SCCS file. The version is retrieved by
beginning with the initial version and then applying deltas, in order, until the desired version is obtained. The
resulting file, called a gfile (for gotten), is created in the current directory and is owned by the real user. The
mode assigned to the gfile depends on how the get command is used.
The most common use of get is
$ get s.abc
which normally retrieves the latest version of s.abc from the SCCS file tree trunk and produces (for example)
on the standard output
1.3
67 lines
No id keywords (cm7)
meaning version 1.3 of s.abc was retrieved (assuming 1.3 is the latest trunk delta), it has 67 lines of text, and
no ID keywords were substituted in the file.
The gfile, namely, file abc, is given access permission mode 444 (readonly for owner, group, and other).
This particular way of using get is intended to produce gfiles only for inspection, compilation, or copying,
for example. It is not intended for editing (making deltas).
When several files are specified, the same information is output for each one. For example,
$ get s.abc s.xyz
produces
s.abc:
1.3
67 lines
No id keywords (cm7)
get
169
is the ID keyword replaced by the SID of the retrieved version of a file. Similarly, %H% and %M% are the
date and name of the gfile, respectively. Thus, executing get on an SCCS file that contains the PL/I
declaration
DCL ID CHAR(100) VAR INIT('%M% %I% %H%');
This message is normally treated as a warning by get although the presence of the i flag in the SCCS file
causes it to be treated as an error. For a complete list of the keywords provided, see get(CP).
Retrieval of different versions
The version of an SCCS file that get retrieves by default is the most recently created delta of the highest
numbered trunk release. However, any other version can be retrieved with get r by specifying the version's
SID. Thus,
$ get r1.3 s.abc
retrieves version 1.3 of s.abc and produces (for example) on the standard output
1.3
64 lines
When a SID is specified and the particular version does not exist in the SCCS file, an error message results.
Omitting the level number, as in
get
170
causes retrieval of the trunk delta with the highest level number within the given release. Thus, the above
command might output
3.7
213 lines
If the given release does not exist, get retrieves the trunk delta with the highest level number within the
highestnumbered existing release that is lower than the given release. For example, assume release 9 does
not exist in file s.abc and release 7 is the highestnumbered release below 9. Executing
$ get r9 s.abc
would produce
7.6
420 lines
which indicates that trunk delta 7.6 is the latest version of file s.abc below release 9. Similarly, omitting the
sequence number, as in
$ get r4.3.2 s.abc
results in the retrieval of the branch delta with the highest sequence number on the given branch. This might
result in the following output:
4.3.2.8
89 lines
(If the given branch does not exist, an error message results.)
get t will retrieve the latest (top) version of a particular release when no r is used or when its value is
simply a release number. The latest version is the delta produced most recently, independent of its location on
the SCCS file tree. Thus, if the most recent delta in release 3 is 3.5,
$ get r3 t s.abc
would produce
3.5
59 lines
However, if branch delta 3.2.1.5 were the latest delta (created after delta 3.5), the same command might
produce
3.2.1.5
46 lines
Updating source
get e indicates an intent to make a delta. First, get checks the following:
get
171
Undoing a get e
There may be times when a file is retrieved accidentally for editing; there is really no editing that needs to be
done at this time. In such cases, the unget command can be used to cancel the delta reservation that was set
up.
Additional get options
If get r and/or t are used together with e, the version retrieved for editing is the one specified with r
and/or t.
get i and x are used to specify a list of deltas to be included and excluded, respectively (see get(CP) for the
syntax of such a list). Including a delta means forcing its changes to be included in the retrieved version. This
is useful in applying the same changes to more than one version of the SCCS file. Excluding a delta means
forcing it not to be applied. This may be used to undo the effects of a previous delta in the version to be
get
172
get k is used either to regenerate a gfile that may have been accidentally removed or ruined after get e, or
simply to generate a gfile in which the replacement of ID keywords has been suppressed. A gfile generated
by get k is identical to one produced by get e, but no processing related to p.file takes place.
Concurrent edits of different SID
The ability to retrieve different versions of an SCCS file allows several deltas to be in progress at any given
time. This means that several get e commands may be executed on the same file as long as no two
executions retrieve the same version (unless multiple concurrent edits are allowed).
The p.file created by get e is created in the same directory as the SCCS file, given mode 644 (readable by
everyone, writable only by the owner), and owned by the effective user. It contains the following information
for each delta that is still in progress:
the SID of the retrieved version
the SID given to the new delta when it is created
the login name of the real user executing get
The first execution of get e causes the creation of p.file for the corresponding SCCS file. Subsequent
executions only update p.file with a line containing the above information. Before updating, however, get
checks to assure that no entry already in p.file specifies that the SID of the version to be retrieved is already
retrieved (unless multiple concurrent edits are allowed). If the check succeeds, the user is informed that other
deltas are in progress and processing continues. If the check fails, an error message results.
It should be noted that concurrent executions of get must be carried out from different directories. Subsequent
executions from the same directory will attempt to overwrite the gfile, which is an SCCS error condition. In
practice, this problem does not arise because each user normally has a different working directory. See
``Protection'' for a discussion of how different users are permitted to use SCCS commands on the same files.
``Determination of new SID'' shows the possible SID components a user can specify with get (leftmost
column), the version that will then be retrieved by get, and the resulting SID for the delta, which delta will
create (rightmost column). In the table
R, L, B, and S mean release, level, branch, and sequence numbers in the SID, and m means
maximum. Thus, for example, R.mL means the maximum level number within release R.
R.L.(mB+1).1 means the first sequence number on the new branch (maximum branch number plus 1)
of level L within release R. Note that if the SID specified is R.L, R.L.B, or R.L.B.S, each of these
specified SID numbers must exist.
The b keyletter is effective only if the b flag (see admin(CP)) is present in the file. An entry of
means irrelevant.
get
173
b keyletter Other
used
conditions
no
yes
no
no
yes
yes
mR.mL
mR.mL
mR.mL
mR.mL
mR.mL
mR.mL
hR.mL
R.mL
mR.(mL+1)
mR.mL.(mB+1).1
R.1
mR.(mL+1)
mR.mL.(mB+1).1
mR.mL.(mB+1).1
hR.mL.(mB+1).1
R.mL.(mB+1).1
R.L
R.L
R.L
R.L.B.mS
R.L.B.mS
R.L.B.S
R.L.B.S
R.L.B.S
R.(L+1)
R.L.(mB+1).1
R.L.(mB+1).1
R.L.B.(mS+1)
R.L.(mB+1).1
R.L.B.(S+1)
R.L.(mB+1).1
R.L.(mB+1).1
no
yes
no
yes
no
yes
R defaults to mR
R defaults to mR
R > mR
R = mR
R > mR
R = mR
R< mR and R does not exist
Trunk successor number in release > R and
R exists
No trunk successor
No trunk successor
Trunk successor in release R
No branch successor
No branch successor
No branch successor
No branch successor
Branch successor
get
174
without an intervening delta. In this case, a delta after the first get will produce delta 1.2 (assuming 1.1 is the
most recent trunk delta), and a delta after the second get will produce delta 1.1.1.1.
Keyletters that affect output
get p causes the retrieved text to be written to the standard output rather than to a gfile. In addition, all
output normally directed to the standard output (such as the SID of the version retrieved and the number of
lines retrieved) is directed instead to the standard error. get p is used, for example, to create a gfile with an
arbitrary name, as in
$ get p s.abc > arbitrary file name
get s suppresses output normally directed to the standard output, such as the SID of the retrieved version and
the number of lines retrieved, but it does not affect messages normally directed to the standard error. get s is
used to prevent nondiagnostic messages from appearing on the user's terminal and is often used with p to
pipe the output, as in
$ get p s s.abc | pg
get g prints the SID on standard output and there is no retrieval of the SCCS file. This is useful in several
ways. For example, to verify a particular SID in an SCCS file
$ get g r4.3 s.abc
outputs the SID 4.3 if it exists in the SCCS file s.abc or an error message if it does not. Another use of get g
is in regenerating a p.file that may have been accidentally destroyed, as in
$ get e g s.abc
get l causes SCCS to create l.file in the current directory with mode 444 (readonly for owner, group, and
other) and owned by the real user. The l.file contains a table (whose format is described on get(CP)). showing
the deltas used in constructing a particular version of the SCCS file. For example
$ get r2.3 l s.abc
generates an l.file showing the deltas applied to retrieve version 2.3 of s.abc. Specifying p with l, as in
$ get lp r2.3 s.abc
causes the output to be written to the standard output rather than to l.file. get g can be used with l to
suppress the retrieval of the text.
get m identifies the changes applied to an SCCS file. Each line of the gfile is preceded by the SID of the
delta that caused the line to be inserted. The SID is separated from the text of the line by a tab character.
get n causes each line of a gfile to be preceded by the value of the %M% ID keyword and a tab character.
This is most often used in a pipeline with grep(C). For example, to find all lines that match a given pattern in
the latest version of each SCCS file in a directory, the following may be executed:
get
175
If both m and n are specified, each line of the gfile is preceded by the value of the %M% ID keyword
and a tab (this is the effect of n) and is followed by the line in the format produced by m.
Because use of m and/or n causes the contents of the gfile to be modified, such a gfile must not be used
for creating a delta. Therefore, neither m nor n may be specified together with get e. See the get(CP)
page.
delta
The delta command is used to incorporate changes made to a gfile into the corresponding SCCS file that
is, to create a delta and, therefore, a new version of the file.
The delta command requires the existence of p.file (created by get e). It examines p.file to verify the
presence of an entry containing the user's login name. If none is found, an error message results.
The delta command performs the same permission checks that get e performs. If all checks are successful,
delta determines what has been changed in the gfile by comparing it with its own temporary copy of the
gfile as it was before editing. This temporary copy is called d.file and is obtained by performing an internal
get on the SID specified in the p.file entry.
The required p.file entry is the one containing the login name of the user executing delta, because the user
who retrieved the gfile must be the one who creates the delta. However, if the login name of the user appears
in more than one entry, the same user has executed get e more than once on the same SCCS file. Then, delta
r must be used to specify the SID that uniquely identifies the p.file entry. This entry is then the one used to
obtain the SID of the delta to be created.
In practice, the most common use of delta is
$ delta s.abc
which prompts
comments?
to which the user replies with a description of why the delta is being made, ending the reply with a newline
character. The user's response may be up to 512 characters long with newlines (not intended to terminate the
response) escaped by backslashes (\).
If the SCCS file has a v flag, delta first prompts with
MRs?
(Modification Requests) on the standard output. The standard input is then read for MR numbers, separated by
blanks and/or tabs, ended with a newline character. A Modification Request is a formal way of asking for a
correction or enhancement to the file. In some controlled environments where changes to source files are
tracked, deltas are permitted only when initiated by a trouble report, change request, trouble ticket, and so on,
collectively called MRs. Recording MR numbers within deltas is a way of enforcing the rules of the change
management process.
delta
176
In this case, the prompts for comments and MRs are not printed, and the standard input is not read. These two
keyletters are useful when delta is executed from within a shell procedure. Note that delta m is allowed only
if the SCCS file has a v flag.
No matter how comments and MR numbers are entered with delta, they are recorded as part of the entry for
the delta being created. Also, they apply to all SCCS files specified with the delta.
If delta is used with more than one file argument and the first file named has a v flag, all files named must
have this flag. Similarly, if the first file named does not have the flag, none of the files named may have it.
When delta processing is complete, the standard output displays the SID of the new delta (from p.file) and the
number of lines inserted, deleted, and left unchanged. For example:
1.4
14 inserted
7 deleted
345 unchanged
If line counts do not agree with the user's perception of the changes made to a gfile, it may be because there
are various ways to describe a set of changes, especially if lines are moved around in the gfile. However, the
total number of lines of the new delta (the number inserted plus the number left unchanged) should always
agree with the number of lines in the edited gfile.
If you are in the process of making a delta and the delta command finds no ID keywords in the edited gfile,
the message
No id keywords (cm7)
is issued after the prompts for commentary but before any other output. This means that any ID keywords that
may have existed in the SCCS file have been replaced by their values or deleted during the editing process.
This could be caused by making a delta from a gfile that was created by a get without e (ID keywords are
replaced by get in such a case). It could also be caused by accidentally deleting or changing ID keywords
while editing the gfile. Or, it is possible that the file had no ID keywords. In any case, the delta will be
created unless there is an i flag in the SCCS file (meaning the error should be treated as fatal), in which case
the delta will not be created.
After the processing of an SCCS file is complete, the corresponding p.file entry is removed from p.file. All
updates to p.file are made to a temporary copy, q.file, whose use is similar to that of x.file described under
``SCCS command conventions''. If there is only one entry in p.file, then p.file itself is removed.
In addition, delta removes the edited gfile unless n is specified. For example
$ delta n s.abc
delta
177
admin
The admin command is used to administer SCCS files that is, to create new SCCS files and change the
parameters of existing ones. When an SCCS file is created, its parameters are initialized by use of keyletters
with admin or are assigned default values if no keyletters are supplied. The same keyletters are used to
change the parameters of existing SCCS files.
Two keyletters are used in detecting and correcting corrupted SCCS files (see ``Auditing'').
Newly created SCCS files are given access permission mode 444 (readonly for owner, group and other) and
are owned by the effective user. Only a user with write permission in the directory containing the SCCS file
may use the admin(CP) command on that file.
Creation of SCCS files
An SCCS file can be created by executing the command
$ admin ifirst s.abc
in which the value first with i is the name of a file from which the text of the initial delta of the SCCS file
s.abc is to be taken. Omission of a value with i means admin is to read the standard input for the text of the
initial delta.
The command
$ admin i s.abc < first
is issued by admin as a warning. However, if the command also sets the i flag (not to be confused with the i
keyletter), the message is treated as an error and the SCCS file is not created. Only one SCCS file may be
created at a time using admin i.
admin r is used to specify a release number for the first delta. Thus:
admin ifirst r3 s.abc
means the first delta should be named 3.1 rather than the normal 1.1. Because r has meaning only when
creating the first delta, its use is permitted only with i.
admin
178
is automatically generated.
If it is desired to supply MR numbers (admin m), the v flag must be set with f. The v flag simply
determines whether MR numbers must be supplied when using any SCCS command that modifies a delta
commentary in the SCCS file (see sccsfile(4)). An example would be
$ admin ifirst mmrnum1 fv s.abc
Note that y and m are effective only if a new SCCS file is being created.
Initialization and modification of SCCS file parameters
Part of an SCCS file is reserved for descriptive text, usually a summary of the file's contents and purpose. It
can be initialized or changed by using admin t.
When an SCCS file is first being created and t is used, it must be followed by the name of a file from which
the descriptive text is to be taken. For example, the command
$ admin ifirst tdesc s.abc
specifies that the descriptive text of the SCCS file is to be replaced by the contents of desc. Omission of the
filename after the t keyletter as in
$ admin t s.abc
causes the removal of the descriptive text from the SCCS file.
The flags of an SCCS file may be initialized or changed by admin f, or deleted by admin d.
SCCS file flags are used to direct certain actions of the various commands. (See the admin(CP) page for a
description of all the flags.) For example, the i flag specifies that a warning message (stating that there are no
ID keywords contained in the SCCS file) should be treated as an error. The d (default SID) flag specifies the
default version of the SCCS file to be retrieved by the get command.
admin f is used to set flags and, if desired, their values. For example
admin
179
sets the i and m (module name) flags. The value modname specified for the m flag is the value that the get
command will use to replace the %M% ID keyword. (In the absence of the m flag, the name of the gfile is
used as the replacement for the %M% ID keyword.) Several f keyletters may be supplied on a single
admin, and they may be used whether the command is creating a new SCCS file or processing an existing
one.
admin d is used to delete a flag from an existing SCCS file. As an example, the command
$ admin dm s.abc
removes the m flag from the SCCS file. Several d keyletters may be used with one admin and may be
intermixed with f.
SCCS files contain a list of login names and/or group IDs of users who are allowed to create deltas. This list is
empty by default, allowing anyone to create deltas. To create a user list (or add to an existing one), admin a
is used. For example,
$ admin axyz awql a1234 s.abc
adds the login names xyz and wql and the group ID 1234 to the list. admin a may be used whether creating a
new SCCS file or processing an existing one.
admin e (erase) is used to remove login names or group IDs from the list.
prs
The prs command is used to print all or part of an SCCS file on the standard output. If prs d is used, the
output will be in a format called data specification. Data specification is a string of SCCS file data keywords
(not to be confused with get ID keywords) interspersed with optional user text.
Data keywords are replaced by appropriate values according to their definitions. For example,
:I:
is defined as the data keyword replaced by the SID of a specified delta. Similarly, :F: is the data keyword for
the SCCS filename currently being processed, and :C: is the comment line associated with a specified delta.
All parts of an SCCS file have an associated data keyword. For a complete list, see the prs(CP) page.
There is no limit to the number of times a data keyword may appear in a data specification. Thus, for example,
$ prs d":I: this is the top delta for :F: :I:" s.abc
Information may be obtained from a single delta by specifying its SID using prs r. For example,
$ prs d":F:: :I: comment line is: :C:" r1.4 s.abc
prs
180
If r is not specified, the value of the SID defaults to the most recently created delta.
In addition, information from a range of deltas may be obtained with l or e. The use of prs e substitutes
data keywords for the SID designated with r and all deltas created earlier, while prs l substitutes data
keywords for the SID designated with r and all deltas created later. Thus, the command
$ prs d:I: r1.4 e s.abc
may output
1.4
1.3
1.2.1.1
1.2
1.1
may produce
3.3
3.2
3.1
2.2.1.1
2.2
2.1
1.4
Substitution of data keywords for all deltas of the SCCS file may be obtained by specifying both e and l.
sact
sact is a special form of the prs command that produces a report about files that are out for edit. The
command takes only one type of argument: a list of file or directory names. The report shows the SID of any
file in the list that is out for edit, the SID of the impending delta, the login of the user who executed the get e
command, and the date and time the get e was executed. It is a useful command for an administrator.
help
The help command prints information about messages that may appear on the user's terminal. Arguments to
help are the code numbers that appear in parentheses at the end of SCCS messages. (If no argument is given,
help prompts for one.) Explanatory information is printed on the standard output. If no information is found,
an error message is printed. When more than one argument is used, each is processed independently, and an
error resulting from one will not stop the processing of the others. For more information, see the help(CP)
page.
sact
181
rmdel
The rmdel command allows removal of a delta from an SCCS file. Its use should be reserved for deltas in
which incorrect global changes were made. The delta to be removed must be a leaf delta. That is, it must be
the most recently created delta on its branch or on the trunk of the SCCS file tree. In ``Extended branching
concept'', only deltas 1.3.1.2, 1.3.2.2, and 2.2 can be removed. Only after they are removed can deltas 1.3.2.1
and 2.1 be removed.
To be allowed to remove a delta, the effective user must have write permission in the directory containing the
SCCS file. In addition, the real user must be either the one who created the delta being removed or the owner
of the SCCS file and its directory.
The r keyletter is mandatory with rmdel. It is used to specify the complete SID of the delta to be removed.
Thus
$ rmdel r2.3 s.abc
cdc
The cdc command is used to change the commentary made when the delta was created. It is similar to the
rmdel command (for example, r and full SID are necessary), although the delta need not be a leaf delta. For
example,
$ cdc r3.4 s.abc
specifies that the commentary of delta 3.4 is to be changed. New commentary is then prompted for as with
delta.
The old commentary is kept, but it is preceded by a comment line indicating that it has been superseded, and
the new commentary is entered ahead of the comment line. The inserted comment line records the login name
of the user executing cdc and the time of its execution.
rmdel
182
what
The what command is used to find identifying information in any UNIX system file whose name is given as
an argument. No keyletters are accepted. The what command searches the given file(s) for all occurrences of
the string @(#), which is the replacement for the %Z% ID keyword (see the get(CP) page). It prints on the
standard output whatever follows the string until the first double quote (``"''), greater than symbol (>),
backslash (\), newline, null, or nonprinting character.
For example, if an SCCS file called s.prog.c (a C language source file) contains the following line
char id[]= "%W%";
is used, the resulting gfile is compiled to produce prog.o and a.out. Then, the command
$ what prog.c prog.o a.out
produces
prog.c:
prog.c:
prog.o:
prog.c:
a.out:
prog.c:
3.4
3.4
3.4
The string searched for by what need not be inserted with an ID keyword of get; it may be inserted in any
convenient manner.
sccsdiff
The sccsdiff command determines (and prints on the standard output) the differences between any two
versions of an SCCS file. The versions to be compared are specified with sccsdiff r in the same way as with
get r. SID numbers must be specified as the first two arguments. The SCCS file or files to be processed are
named last. Directory names and a lone hyphen are not acceptable to sccsdiff.
The following is an example of the format of sccsdiff:
$ sccsdiff r3.4 r5.6 s.abc
what
183
comb
The comb command lets the user reduce the size of an SCCS file. It generates a shell procedure on the
standard output, which reconstructs the file by discarding unwanted deltas and combining other specified
deltas. (It is not recommended that comb be used as a matter of routine.)
In the absence of any keyletters, comb preserves only leaf deltas and the minimum number of ancestor deltas
necessary to preserve the shape of an SCCS tree. The effect of this is to eliminate middle deltas on the trunk
and on all branches of the tree. Thus, in ``Extended branching concept'', deltas 1.2, 1.3.2.1, 1.4, and 2.1 would
be eliminated.
Some of the keyletters used with this command are:
comb s
This option generates a shell procedure that produces a report of the percentage space (if any) the user
will save. This is often useful as a preliminary check.
comb p
This option is used to specify the oldest delta the user wants preserved.
comb c
This option is used to specify a list (see the get(CP) page for its syntax) of deltas the user wants
preserved. All other deltas will be discarded.
The shell procedure generated by comb is not guaranteed to save space. A reconstructed file may even be
larger than the original. Note, too, that the shape of an SCCS file tree may be altered by the reconstruction
process.
val
The val command is used to determine whether a file is an SCCS file meeting the characteristics specified by
certain keyletters. It checks for the existence of a particular delta when the SID for that delta is specified with
r.
The string following y or m is used to check the value set by the t or m flag, respectively. See admin(CP)
for descriptions of these flags.
The val command treats the special argument hyphen differently from other SCCS commands. It allows val to
read the argument list from the standard input instead of from the command line, and the standard input is
read until an endoffile (controld) is entered. This permits one val command with different values for
keyletters and file arguments. For example,
$ val
yc mabc s.abc
mxyz ypl1 s.xyz
control_d
comb
184
SCCS files
This section covers protection mechanisms used by SCCS, the format of SCCS files, and the recommended
procedures for auditing SCCS files.
Protection
SCCS relies on the capabilities of the UNIX system for most of the protection mechanisms required to
prevent unauthorized changes to SCCS files that is, changes by nonSCCS commands. Protection features
provided directly by SCCS are the release lock flag, the release floor and ceiling flags, and the user list.
Files created by the admin command are given access permission mode 444 (readonly for owner, group, and
other). This mode should remain unchanged because it (generally) prevents modification of SCCS files by
nonSCCS commands. Directories containing SCCS files should be given mode 755, which allows only the
owner of the directory to modify it.
SCCS files should be kept in directories that contain only SCCS files and any temporary files created by
SCCS commands. This simplifies their protection and auditing. The contents of directories should be logical
groupings subsystems of the same large project, for example.
SCCS files should have only one link (name) because commands that modify them do so by creating and
modifying a copy of the file. When processing is done, the contents of the old file are automatically replaced
by the contents of the copy, whereupon the copy is destroyed. If the old file had additional links, this would
break them. Then, rather than process such files, SCCS commands would produce an error message.
When only one person uses SCCS, the real and effective user IDs are the same; and the user ID owns the
directories containing SCCS files. Therefore, SCCS may be used directly without any preliminary
preparation.
When several users with unique user IDs are assigned SCCS responsibilities (on large development projects,
for example), one user that is, one user ID must be chosen as the owner of the SCCS files. This person
will administer the files (use the admin command) and will be SCCS administrator for the project. Because
other users do not have the same privileges and permissions as the SCCS administrator, they are not able to
execute directly those commands that require write permission in the directory containing the SCCS files.
Therefore, a projectdependent program is required to provide an interface to the get, delta, and, if desired,
rmdel and cdc commands.
The interface program must be owned by the SCCS administrator and must have the
setuserIDonexecution bit on (see chmod(C)). This assures that the effective user ID is the user ID of the
SCCS administrator. With the privileges of the interface program during command execution, the owner of an
SCCS file can modify it at will. Other users whose login names or group IDs are in the user list for that file
(but are not the owner) are given the necessary permissions only for the duration of the execution of the
interface program. Thus, they may modify SCCS only with delta and, possibly, rmdel and cdc.
SCCS files
185
Formatting
SCCS files are composed of lines of ASCII text arranged in six parts as follows:
Checksum
a line containing the logical sum of all the characters of the file (not including the checksum line
itself)
DeltaTable
information about each delta, such as type, SID, date and time of creation, and commentary
UserNames
list of login names and/or group IDs of users who are allowed to modify the file by adding or
removing deltas
Flags
indicators that control certain actions of SCCS commands
Descriptive Text
usually a summary of the contents and purpose of the file
Body
the text administered by SCCS, intermixed with internal SCCS control lines
Details on these file sections may be found in sccsfile(4). The checksum line is discussed in ``Auditing''.
Because SCCS files are ASCII files they can be processed by nonSCCS commands like ed, grep, and cat.
This is convenient when an SCCS file must be modified manually (a delta's time and date were recorded
incorrectly, for example, because the system clock was set incorrectly), or when a user wants simply to look at
the file.
CAUTION:
Extreme care should be exercised when modifying SCCS files with nonSCCS commands.
Auditing
When a system or hardware malfunction destroys an SCCS file, any command will issue an error message.
Commands also use the checksum stored in an SCCS file to determine whether the file has been corrupted
because it was last accessed (possibly by having lost one or more blocks or by having been modified with ed).
No SCCS command will process a corrupted SCCS file except the admin h or z, as described below.
SCCS files should be audited for possible corruptions on a regular basis. The simplest and fastest way to do
an audit is to use admin h and specify all SCCS files:
admin h s.file1 s.file2
. . .
or
Formatting
186
. . .
If the new checksum of any file is not equal to the checksum in the first line of that file, the message
corrupted file (co6)
is produced for that file. The process continues until all specified files have been examined. When examining
directories (as in the second example above), the checksum process will not detect missing files. A simple
way to learn whether files are missing from a directory is to execute the ls command periodically, and
compare the outputs. Any file whose name appeared in a previous output but not in the current one no longer
exists.
When a file has been corrupted, the way to restore it depends on the extent of the corruption. If damage is
extensive, the best solution is to contact the local UNIX system operations group and request that the file be
restored from a backup copy. If the damage is minor, repair through editing may be possible. After such a
repair, the admin command must be executed:
$ admin z s.file
The purpose of this is to recompute the checksum and bring it into agreement with the contents of the file.
After this command is executed, any corruption that existed in the file will no longer be detectable.
Formatting
187
This topic describes how to package software that will be installed on computers running SCO Openserver or
UnixWare. A packaging tool, the pkgmk(C) command, is provided to help automate package creation. It
gathers the components of a package on the development machine, copies them onto the installation medium,
and places them into a structure that the installation tool, pkgadd(ADM), recognizes.
This topic also describes the pkgadd(ADM) command, which copies the package from the installation
medium onto a system and performs system housekeeping routines that concern the package. This tool is
primarily for the installer but is described here to provide you with a background on the environment into
which your packages will be placed and to help you testinstall packages.
The first two sections describe what a package consists of and gives an overview of the structural life cycle of
a package (how its structure on your development machine relates to its structure on the installation medium
and on the installation machine).
The remaining sections familiarize you with the tools, files, and scripts involved in creating a package,
provide suggestions for how to approach software packaging, and describe some specific procedures.
The section on set packaging describes how you can collect an arbitrary number of packages into a single
installable image (the ``set'') for installation on the target machine.
After reading this topic, you should study ``Case studies of package installation'', which provides case studies
using the tools and techniques described in this topic.
Contents of a package
A software package is a group of components that together create the software. These components naturally
include the executables that comprise the software, but they also include at least two information files and can
optionally include other information files and scripts.
As shown in Figure 81, a package's contents fall into three categories:
required components (the pkginfo(F) file, the prototype(F) file, package objects)
optional package information files
optional packaging scripts
188
Required components
A package must contain at least the following components:
Package objects
These are the objects that make up the software. They can be files (executable or data), directories, or
named pipes. Objects can be manipulated in groups during installation by placing them into classes.
You will learn more about classes in ``3. Placing objects into classes''.
The pkginfo(F) file
This required package information file defines parameter values that describe a package. For example,
this file defines values for the package abbreviation, the full package name, and the package
architecture.
The prototype(F) file
This required package information file lists the contents of the package. There is one entry for each
deliverable object consisting of several fields of information that describes the object. All package
components, including the pkginfo(F) file, must be listed in the prototype(F) file.
Both required package information files are described further in ``The package information files'' and on their
respective manual pages.
compver(F)
Required components
189
190
This is only one possible, and very simple, arrangement. Files can, in fact, be gathered from anywhere
on your development system, but in many cases using a packaging area such as the one suggested
above simplifies the process of creating a package image and maintaining a backup of what was
packaged for future reference.
2. Copy your source files to /home/user1/pkgarea/src, recreating the target directory structure;
make sure the permissions, owners, and groups are as you want them to be on the target system.
For the purpose of this procedure, we'll assume this structure under /home/user1/pkgarea/src:
/home/user1/pkgarea/src
/bin
cmd1
cmd2
/man
/man1
cmd1.1
cmd2.1
This directory structure mimics the directory structure that we want under the target install directory
(/usr/local).
This step is not required, and might be impractical for a large number of files or even a small set of
very large files. However, if all your files will be installed under a common target directory, copying
them to the staging area can simplify the creation of the prototype file later in the procedure.
It is possible to gather source files during the build process from multiple locations on the system. To
do this, you need to provide a custom prototype file and use appropriate arguments to the pkgproto
and pkgmk commands, as we will discuss later in this procedure.
3. Create package control scripts and information files in (or copy them to) the packaging area.
The files required to create a package are the prototype(F) and pkginfo(F) information files.
We'll create the prototype file in the next step.
The pkginfo file tells the system about your package, and must contain values for the five required
installation parameters, as in this example:
PKG=testpkg
NAME='Test Package'
ARCH='i386'
191
The most important of these is PKG, which must be a unique identifier for your package (the PKG
variable's value is displayed by the pkginfo(C) command as the package name). It is recommended
that you define a PKG value between 3 and 9 characters, using alphanumeric characters (az, AZ,
09) and the special characters dot (.), hyphen (), and underscore (_) only. See pkginfo(F) for a
description of these and other parameters that you can define in the pkginfo file; applicationspecific
parameters may also be included.
No other files are required by pkgmk(C), but the particular needs of your installation may require
them. [See ``Script processing'' for the sequence of script processing during software installation and
removal.]
For example, if your application requires an interactive install, you'll probably need a request script to
interact with the user, and a preinstall script to do any work necessary before the installation of the
application's files. If any configuration needs to be done once your source files have been copied to
the target system, then you'll need to provide a postinstall script that performs the configuration.
The best way to determine which files you need is to:
Read the descriptions of the various information files later in this chapter under ``The
installation scripts''.
Look in the /var/sadm/pkg directory, which contains one directory for each installed package
on your system. The directory names correspond to the PKG variable of the package, as
reported by pkginfo(C). Each of these directories contains copies of the package information
files and the control scripts used during package installation (the control scripts are located in
a subdirectory named install). Use the pkginfo command to locate a similar application on
your system, and use that package's files as examples.
4. In packaging area, execute the pkgproto(C) command to create an initial prototype file.
If we assume:
we've populated the src directory in our packaging area with all the directories and files we
need into install on the target
all the files will be installed under /usr/local
We can use a rather simple pkgproto command:
pkgproto src=/usr/local > prototype
If you need to package files from outside the packaging area (rather than having copied them into the
src directory, or if you want to install files to various places on the target machine, you'll need a more
complex pkgproto command.
For example, let's assume there's a very large binary that we want to put in the package and it's
located on the development system at /build/build39/cmd2bld/cmd2. To include this file in the
prototype file without copying it to the packaging area, we could use a pkgproto command like this:
pkgproto /build/build39/cmd2bld=/usr/local/bin src=/var/opt > prototype
192
This places a file system format package image in the directory named by path. Use the o option to
overwrite a previously created package image, if necessary.
8. Convert the package to datastream format, if desired.
If you plan to make the package image available for download or transfer to another system, you may
want to use the pkgtrans command to convert the file system image you just created to a datastream
image. A datastream image is an identical, singlefile, ASCIIformat version of the file system
image, and so is typically easier to transfer than a file system image (which might contain many files
and directories).
The example command below places a datastream image of the package in /var/spool/pkg/testpkg.ds:
pkgtrans s path /var/spool/pkg/testpkg.ds
If you wanted to perform the reverse operation, creating a file system image from a datastream image,
use a command like this example:
Optional installation scripts
193
The above command places a file system format image of the package under existing_dir/testpkg.
9. Install the package on a target system for installation testing.
To install the ``file system image'' we created above, enter:
pkgadd d path testpkg
[Note that when using datastreams, the name of the file containing the datastream image (testpkg.ds)
is considered part of the device name. This is necessary because a datastream image can contain any
number of packages or sets. A file system format image can contain only one package or set.]
10. Verify that the package was successfully installed. Use the pkginfo command on the target system:
pkginfo l testpkg
Which should return the contents of the package's pkginfo file, as well as the date and time the
package was installed on the target, and other information.
11. Check the list of files and their attributes as actually installed on the target system.
To list the pathnames of all files installed by a package, one per line, use the pkgchk(ADM)
command:
pkgchk v testpkg
To get a detailed listing of installed files and directories, including their contents and attributes, use:
pkgchk l testpkg | pg
The output can reveal problems with the package information files and control scripts, such as
incorrectly specified entries in the prototype file.
12. Remove the package to test package removal. Use the pkgrm(ADM) command:
pkgrm testpkg
The pkginfo command should now return no information for the package:
# pkginfo testpkg
UX:pkginfo: ERROR: information for "testpkg" was not found
See ``Quick steps to network installation'' for how to set up a network install server to offer your package for
installation on remote SCO OpenServer systems.
194
If the image on CD is a datastream image, you can omit the r option to the cp command, and specify
the name of the file containing the datastream as pkgname.
If you need to change the format of the installable image, you can copy and change it's format in one
command using the pkgtrans(C) command. To go from datastream to file system format:
pkgtrans /mnt/datastream /var/spool/dist pkgname
NOTE: You will most often want to use file system format for the packages you place under
/var/spool/dist, so that remote users can find out what packages are available for installation from the
server without having to know any file names in advance. Users need to know the name of a
datastream file in order to install from it. If you place a datastream under /var/spool/dist, it's name will
not be listed automatically by remote installers using either the pkglist or pkginstall commands, or by
the Application Installer interface. See ``Network installation from the command line'' and
``Network installation from the graphical interface'' for more explanation.
2. Enable the installation server, specifying the appropriate network protocol (tcp or spx).
For example the following command enables the network installation of all the packages under
/var/spool/dist using the tcp protocol.
installsrv e n tcp
Note that the appropriate protocol must be enabled on your system, which must also be connected to a
network of the appropriate type in order for it to be able to process nonlocal installation requests.
3. To ensure your server is configured properly, use the pkglist(ADM) command to list the file system
format packages it is offering for network install from /var/spool/dist:
pkglist s 0.0.0.0: all
For datastream format images, use a command like the following, substituting the name of a
datastream format file under /var/spool/dist for datastream:
pkglist s 0.0.0.0:datastream all
195
If the package is in a datastream format file on the server, you'll need to know the name of the file to
list the packages contained in it (more than one package can be in a single datastream file). For
example, to list all the packages on a server in a datastream format file named package.ds, enter:
pkglist s server:/var/spool/dist/package.ds all
2. Enter the appropriate pkginstall command. For example, to install all the packages from the server
that are in file system format under /var/spool/dist, enter:
pkginstall s server all
196
On a development machine
Packages originate on a development machine. They can be in the same directory structure on your
machine as they will be placed on the installation machine. pkgmk(C) can also locate components on
the development machine and give them different pathnames on the installation machine.
On the installation media
When pkgmk copies the package components from the development machine to the installation
medium, it places them into the structure you defined in your prototype(F) file and a format that
pkgadd(ADM) recognizes.
On the installation machine
pkgadd copies a package from the installation medium and places it in the structure defined in your
prototype file. Package objects can be defined as relocatable, meaning the installer can define the
actual location of these package objects on the installation machine during installation. Objects with
fixed locations are copied to their predefined path.
pkgmk
The pkgmk(C) command takes all of the package objects residing on the development machine, optionally
compresses them, copies them onto the installation medium, and places them into a fixed directory structure.
You are not required to know the details of the fixed directory structure because pkgmk takes care of the
formatting.
Files can be unstructured on the development machine and pkgmk will structure them correctly on the
medium based on information supplied in the prototype(F) file. The installation medium onto which a
package is formatted can be removable (a disk, for example) or it can be a directory on a machine.
The structural life cycle of a package
197
pkgtrans
The pkgtrans(C) command translates a package already created with pkgmk(C) from one package format to
another. It can make the following translations:
a fixed directory structure to a datastream
a datastream to a fixed directory structure
a fixed directory structure to a fixed directory structure
A package in a fixed directory structure can be in a directory on disk (for example, in a spooling directory) or
on a removable device such as a diskette. A datastream can be on any device; for example, on a disk or a tape.
pkgproto
The pkgproto(C) command generates a prototype(F) file. It scans the paths specified on the command line
and creates description line entries for these paths. If the pathname is a directory, an entry for each object in
the directory is generated. You can use the c option of the pkgproto command to place objects into a
particular class.
When you create a prototype(F) file with an editor, it does not matter how package components are organized
on your development machine. You use the path1=path2 pathname format to define where the files reside on
your development machine and where they should be placed on the installation machine. However, when you
use pkgproto to create your file, your development area must be structured exactly as you want your package
to be structured.
198
pkginfo
This required package information file defines parameter values that describe characteristics of the package,
such as the package abbreviation, full package name, package version, and package architecture. The
definitions in this file can set values for all of the installation parameters defined in the pkginfo(F) manual
page.
Each entry in the file uses the following format to establish the value of a parameter:
PARAM="value"
199
The pkginfo(C) and pkgparam(C) commands can be used to access information in a pkginfo file.
NOTE: Before defining the PKG, ARCH, and VERSION parameters, you need to know how
pkgadd(ADM) defines a package instance and the rules associated with naming a package. Refer to ``2.
Defining a package instance'' before assigning values to these parameters.
prototype
This required package information file, prototype(F), contains a list of the package contents. The pkgmk(C)
command uses this file to identify the contents of a package and its location on the development machine
when building the package.
You can create this file in two ways. As with all the package information files, you can use an editor to create
a file named prototype. It should contain entries following the description given below. You can also use the
pkgproto(C) command to generate the file automatically. To use the second method, you must have a copy of
your package on your development machine that is structured exactly as you want it structured on the
installation machine and all modes and permissions must be correct. If you are not going to use pkgproto,
you do not need a structured copy of your package.
The two types of entries in the prototype file are description lines and command lines.
The description lines
For each deliverable object, you must create one description line that consists of several fields describing the
object. This entry describes such information as mode, owner, and group for the object. You can also use this
entry to accomplish the following tasks:
You can override the pkgmk(C) command's placement of an object on a multiplepart package. See
``10. Distributing packages over multiple volumes'' for more details.
You can place objects into classes. See ``3. Placing objects into classes'' for details.
You can tell pkgmk(C) where to find an object in your development directory structure and map that
name to the correct placement on the installation machine. See ``Mapping development pathnames to
installation pathnames'' for details.
You can define an object as relocatable. See ``Defining collectively relocatable objects'' and
``Defining individually relocatable objects'' for details.
You can define links. See ``9. Creating the prototype file'' for details.
The generic format of the descriptive line is:
[ part ] ftype class pathname [ major minor ] [ mode owner group ]
part
prototype
200
pkginfo
request
bin /ncmpbin 0755 root other
bin /ncmpbin/dired=/usr/ncmp/bin/dired 0755 root other
bin /ncmpbin/less=/usr/ncmp/bin/less 0755 root other
bin /ncmpbin/ttype=/usr/ncmp/bin/ttype 0755 root other
search pathnames
Specifies a list of directories (separated by white space) in which pkgmk(C) should search when
looking for package objects. pathnames is prepended to the basename of each object in the prototype
file until the object is located.
prototype
201
include filename
Specifies the pathname of another prototype file that should be merged into this one during
processing. (Note that search requests do not span include files. Each prototype file should have its
own search command defined, if one is needed.)
default mode owner group
Defines the default mode owner group that should be used if this information is not supplied in a
prototype entry that requires the information. (The defaults do not apply to entries in any include
files. Each prototype file should have its own default command defined, if one is needed.)
param=value
Places the indicated parameter in the packaging environment. This allows you to expand a variable
pathname so that pkgmk can locate the object without changing the actual object pathname. (This
assignment will not be available in the installation environment.)
A command line must always begin with an exclamation point (``!''). Commands can have variable
substitutions embedded within them.
Here is an example prototype file with both description and command lines:
!PROJDIR=/usr/myname
!search /usr/myname/bin /usr/myname/src /usr/myname/hdrs
!include $PROJDIR/src/prototype
i pkginfo
i request
d bin ncmpbin 0755 root other
f bin ncmpbin/dired=/usr/ncmp/bin/dired 0755 root other
f bin ncmpbin/less=/usr/ncmp/bin/less 0755 root other
f bin ncmpbin/ttype=/usr/ncmp/bin/ttype 0755 root other
!default 755 root bin
compver
The compver(F) package information file defines previous (or future) versions of the package that are
compatible with this version. Each line in the file consists of a string defining a version of the package with
which the current version is compatible. Because some packages might require installation of a particular
version of another software package, compatibility information is extremely crucial. If a package ``A''
requires version ``1.0'' of application ``B'' as a prerequisite, but the customer installing ``A'' has a new and
improved version of ``1.3'' of ``B'', the compver(F) file for ``B'' must indicate that the new version is
compatible with version ``1.0'' in order for the customer to install package ``A''. The string must match the
definition of the VERSION parameter in the pkginfo(F) file of the package considered to be compatible. Here
is an example of this file:
Version 1.3
Version 1.0
compver
202
copyright
The copyright(F) package information file contains the text of a copyright message that will be printed on the
terminal at the time of package installation or removal. The display is exactly as shown in the file. Here is an
example of this file.
Copyright (c) 2004 The SCO Group, Inc.
All Rights Reserved.
depend
The depend(F) package information file defines software dependencies associated with the package. You can
define three types of package dependencies with this file:
a prerequisite package (this package depends on the existence of another package)
a reverse dependency (another package depends on the existence of this package)
an incompatible package (your package is incompatible with this one)
The generic format of a line in this file is:
type pkg name
type
Defines the dependency type.
P indicates the named package is a prerequisite for installation.
I indicates the named package is incompatible.
R indicates a reverse dependency (the named package requires that this package be on the system).
This last type should only be used when a preUNIX System V Release 4 package (that cannot
deliver a depend file) relies on the newer package.
pkg
Indicates the package abbreviation for the package.
name
Specifies the full package name (used for display purposes only).
Here is an example of this file:
P acu
copyright
Advanced C Utilities
203
space
The space(F) package information file defines disk space requirements for the target environment beyond
that which is used by objects defined in the prototype(F) filefor example, files that will be dynamically
created at installation time. It should define the maximum amount of additional space that a package will
require.
The generic format of a line in this file is:
pathname blocks inodes
Definitions for each field are as follows:
pathname
Names a directory in which there are objects that will require additional space. The pathname can be
the mount point for a filesystem. Pathnames that do not begin with a slash (/) indicate relocatable
directories.
blocks
Defines the number of 512byte disk blocks required for installation of the files and directory entries
contained in the pathname. (Do not include filesystem dependent disk usage.)
inodes
Defines the number of inodes required for installation of the files and directory entries contained in
pathname.
Numbers of blocks or inodes can be negative to indicate that the package will ultimately (after processing by
scripts, and so on) take up less space than the installation tool would calculate.
Here is an example of this file:
# extra space required by config data which is
# dynamically loaded onto the system
data 500 1
pkgmap
The pkgmk(C) command creates the pkgmap(F) file when it processes the prototype file. This new file
contains all of the information in the prototype file plus three new fields for each entry. These fields are
``size'' (file size in bytes), ``cksum'' (checksum of file), and ``modtime'' (last time of modification). All
command lines defined in the prototype file are executed as pkgmk(C) creates the pkgmap(F) file. The
pkgmap file is placed on the installation medium. The prototype file is not. Refer to the pkgmap(F) manual
page for more details about this file.
space
204
request script
Solicits administrator interaction during package installation for the purpose of assigning or
redefining environment parameter assignments.
class action scripts
Defines an action or set of actions that should be applied to a class of files during installation or
removal. You define your own classes or you can use one of three standard classes (sed, awk, and
build). See ``3. Placing objects into classes'' for details on how to define a class.
procedure scripts
Specifies a procedure to be invoked before or after the installation or removal of a package. The four
procedure scripts are preinstall, postinstall, preremove, and postremove.
You decide which type of script to use based on when you want the script to execute.
NOTE: All installation scripts must be executable by sh (for example, a shell script or an executable
program).
Script processing
You can customize the actions taken during installation by delivering installation scripts with your package.
The decision on which type of script to use depends upon when the action is needed during the installation
process. As a package is installed, pkgadd(ADM) performs the following steps:
1. Executes the request script.
This is the only point at which your package can solicit input from the installer.
2. Executes the preinstall script.
3. Installs the package objects.
Installation occurs classbyclass and class action scripts are executed accordingly. The list of classes
operated upon and the order in which they should be installed is initially defined with the CLASSES
parameter in your pkginfo(F) file. However, your request script can change the value of CLASSES.
NOTE: Be absolutely sure that the CLASSES environment variable in the pkginfo file lists all the
package's class names from the prototype file, or that they are conditionally added to the CLASSES
variable (as appropriate) by the request script. Only those class names appearing in the CLASSES
environment variable at the time installation begins will be installed. Any files belonging to a class
name not found in CLASSES at the time the installation begins will not be installed.
205
Installation parameters
The following four groups of parameters are available to all installation scripts. Some of the parameters can
be modified by a request script, others cannot be modified at all.
The four system parameters that are generated by the installation software (see below for a description
of these). None of these parameters can be modified by a package.
The 21 standard installation parameters defined in the pkginfo(F) file. Of these, a package can only
modify the CLASSES parameter. The standard installation parameters are described in detail in the
pkginfo(F) manual page.
You can define your own installation parameters by assigning a value to them in the pkginfo(F) file.
Such a parameter must be alphanumeric with an initial capital letter. Any of these parameters can be
changed by a request script.
Your request script can define new parameters by assigning values to them and placing them into the
installation environment.
The four installation parameters that are generated by installation software are described below:
PATH
Specifies the search list used by sh to find commands; PATH is set to
/sbin:/usr/sbin:/usr/bin:/usr/sadm/install/bin upon script invocation.
UPDATE
Indicates that the current installation is intended to update the system. Automatically set to true if the
package being installed is overwriting a version of itself.
PKGINST
Specifies the instance identifier of the package being installed. The value is equal to the package
abbreviation (i.e., the same value as the PKG variable in the pkginfo file). See ``2. Defining a package
instance'' for more details.)
PKGSAV
Specifies the directory where files can be saved for use by removal scripts or where previously saved
files may be found.
Installation parameters
206
pkginfo(C)
This command returns information about software packages, such as the instance identifier and
package name.
pkparam(C)
This command returns values for all parameters or only for the parameters specified.
The pkginfo(C) and pkgparam(C) manual pages give details for these tools.
0
Successful completion of script.
1
Fatal error. Installation process is terminated at this point.
2
Warning or possible error condition. Installation will continue. A warning message will be displayed
at the time of completion.
3
Script was interrupted and possibly left unfinished. Installation terminates at this point.
10
System should be rebooted when installation of all selected packages is completed. (This value should
be added to one of the singledigit exit codes described above.)
20
The system should be rebooted immediately upon completing installation of the current package.
(This value should be added to one of the singledigit exit codes described above.)
See the case studies for examples of exit codes in installation scripts.
207
208
i.class
operates on pathnames in the indicated class during package installation
r.class
operates on pathnames in the indicated class during package removal
For example, the name of the installation script for a class named class1 would be i.class1 and the removal
script would be named r.class1.
Class action script usage rules
1. Class action scripts are executed as uid=root and gid=other.
2. If a package spans more than one volume, the class action script will be executed once for each
volume that contains at least one file belonging to the class. Consequently, each script must be
``multiply executable.'' This means that executing a script any number of times with the same input
must produce the same results as executing the script only once.
NOTE: The installation service relies upon this condition being met.
3. The script is executed only if there are files in the given class existing on the current volume.
4. pkgadd(ADM) (and pkgrm(ADM)) creates a list of all objects listed in the pkgmap(F) file that
belong to the class. As a result, a class action script can only act upon pathnames defined in the
pkgmap(F) and belonging to a particular class.
5. A class action script should never add, remove, or modify a pathname or system attribute that does
not appear in the list generated by pkgadd(ADM) unless by use of the installf(ADM) or
removef(ADM) commands.
209
Installation of classes
The following steps outline the system actions that occur when a class is installed. The actions are repeated
once for each volume of a package as that volume is being installed.
1. pkgadd(ADM) creates a list of pathnames upon which the action script will operate. Each line of this
list consists of source and destination pathnames, separated by white space. The source pathname
indicates where the object to be installed resides on the installation volume and the destination
pathname indicates the location on the installation machine where the object should be installed. The
contents of the list is restricted by the following criteria:
The list contains only pathnames belonging to the associated class.
Directories, named pipes, character/block devices, and symbolic links are included in the list
with the source pathname set to /dev/null. They are automatically created by pkgadd(ADM)
(if not already in existence) and given proper attributes (mode, owner, and group) as defined
in the pkgmap(F) file.
Linked files are not included in the list, that is, files where ftype is l. (ftype defines the file
type defined in the prototype file.) Links in the given class are created in Step 4.
If a pathname already exists on the target machine and its contents are no different from the
one being installed, the pathname will not be included in the list.
To determine this, pkgadd(ADM) compares the cksum, modtime, and size fields in the
installation software database with the values for those fields in your pkgmap(F) file. If they
are the same, it then checks the actual file on the installation machine to be certain it really
has those values. If the field values are the same and are correct, the pathname for this object
will not be included in the list.
2. If there is no class action script, the files associated with the pathnames are copied to the target
machine.
If no class action script is provided for installation of a particular class, the files in the generated
pathname list will simply be copied from the volume to the appropriate target location.
3. If there is a class action script, the script is executed.
The class action script is invoked with standard input containing the list generated in Step 1. If this is
the last volume of the package and there are no more objects in this class, the script is executed with
the single argument of ENDOFCLASS.
4. pkgadd(ADM) performs a content and attribute audit and creates links.
After successfully executing Step 2 or 3, an audit of both content and attribute information is
performed on the list of pathnames. pkgadd(ADM) creates the links associated with the class
automatically. Detected attribute inconsistencies are corrected for all pathnames in the generated list.
210
1. pkgrm(ADM) creates a list of installed pathnames that belong to the indicated class. Pathnames
referenced by another package are excluded from the list unless their ftype is e (the file can be edited
upon installation or removal).
If a pathname is referenced by another package, it will not be removed from the system. However, if
it is of ftype e, it can be modified to remove information placed in it by the package being removed.
The modification should be performed by the removal class action script.
2. If there is no class action script, the pathnames are removed.
If your package has no removal class action script for the class, all of the pathnames in the list
generated by pkgrm(ADM) will be removed.
NOTE: Always assign a class for files with an ftype of e (editable) and have an associated class
action script for that class. Otherwise, they will be removed at this point, even if the pathname is
shared with other packages.
3. If there is a class action script, the script is executed.
pkgrm(ADM) invokes the class action script with standard input containing the list generated in Step
1.
4. pkgrm(ADM) performs an audit.
Upon successful execution of the class action script, knowledge of the pathnames is removed from the
system unless a pathname is referenced by another package.
211
!remove # sed(C) instructions to be invoked during the removal process [address [,address]] function
[arguments] . . .
address, function, and arguments are as defined in the sed(C) manual page. See case study 5a and case study
5b for examples of sed class action scripts.
The awk class script
The awk installation class provides a method of installing and removing objects that require modification to
an existing object on the target machine (the object must have been previously installed from another package
installation). Modifications are delivered as awk instructions in an awk class action script.
The awk class action script executes automatically at the time of installation if a file belonging to class awk
exists. Such a file contains instructions for the awk class script in the format shown in the following example.
Two commands indicate when instructions should be executed. awk instructions that follow the !install
command are executed during package installation and those that follow the !remove command are executed
during package removal. It does not matter in which order the commands are used in the file.
The name of the awk class file should be the same as the name of the file upon which the instructions will be
executed.
# comment, which may appear on any line in the file
!install
# awk(C) program to install changes
. . . (awk program)
212
preinstall
executes before class installation begins
postinstall
executes after all volumes have been installed
preremove
executes before class removal begins
postremove
executes after all classes have been removed
213
214
215
PKG
defines the software package abbreviation and remains constant for every instance of a package
VERSION
defines the software package version
ARCH
defines the software package architecture
For example, you might identify two identical versions of a package that run on different hardware as:
Instance 1
PKG="abbr"
VERSION="release 1"
ARCH="MX300I"
Instance 2
PKG="abbr"
VERSION="release
1"
ARCH="i386"
Two different versions of a package that run on the same hardware might be identified as:
Instance 1
PKG="abbr"
VERSION="release 1"
ARCH="i386"
Instance 2
PKG="abbr"
VERSION="release
2"
ARCH="i386"
All instances of a package installed on a system use the package abbreviation as the instance identifier
(PKGINST).
NOTE: pkgmk(C) also assigns an instance identifier to a package as it places it on the installation medium if
one or more instances of a package already exists. That identifier bears no relationship to the identifier
assigned to the same package on the installation machine.
216
2. Ensure that the CLASSES parameter in the pkginfo(F) file has an entry for class1. For example:
CLASSES="class1 class2 none"
217
All objects defined as collectively relocatable are put under the same root directory on the installation
machine. The root directory value will be one of the following (and in this order):
if the admin file contains basedir=ask and pkgadd(ADM) was not invoked in noninteractive mode,
then the installer's response to pkgadd(ADM) when asked where relocatable objects should be
installed (this overrides the value for BASEDIR in the package's pkginfo(F) file, if any)
the value of BASEDIR as it is defined in the admin file used during the pkgadd(ADM) process (the
BASEDIR value assigned in the admin file overrides the value of the pkginfo(F) file)
the value of BASEDIR as it is defined in your pkginfo(F) file (this value is used only as a default in
case the other two possibilities have not supplied a value)
if the admin file contains basedir=default and no BASEDIR is set in the package's pkginfo(F) file,
then BASEDIR defaults to /
218
a prerequisite package
your package depends on the existence of another package
a reverse dependency
another package depends on the existence of your package
NOTE: This type should only be used when a preUNIX System V Release 4 package (that cannot
deliver a depend(F) file) relies on the newer package.
an incompatible package
your package is incompatible with this one
Refer to ``depend'' and ``compver'', or the manual pages depend(F) and compver(F) for details on the formats
of these files.
NOTE: Be certain that your depend and compver files have entries in the prototype(F) file. The file type
should be i (for package information file).
219
PARAM can be any of the 21 standard parameters described in the pkginfo(F) manual page. You can also
create your own package parameters simply by assigning a value to them in this file. Your parameter names
must begin with an uppercase letter followed by either upper or lowercase letters.
The following five parameters are required:
PKG (package abbreviation)
NAME (full package name)
ARCH (package architecture)
VERSION (package version)
CATEGORY (package category)
The CLASSES parameter dictates which classes are installed and the order of installation. Although the
parameter is not required, no classes will be installed without it. Even if you have no class action scripts, the
none class must be defined in the CLASSES parameter before objects belonging to that class will be
installed.
NOTE: You can choose to define the value of CLASSES with a request script and not to deliver a value in
the pkginfo(F) file.
220
Creating links
To define links, you must do the following in the prototype entry for the linked object:
1. Define its ftype as l (a link) or s (a symbolic link).
2. Define its pathname with the format path1=path2 where path1 is the destination and path2 is the
source file.
If your development area is in a different structure than you want the package to be in on the installation
machine, use the prototype entry to map one pathname to the other. You use the path1=path2 format for the
pathname as is used to define links. However, because the ftype is not defined as l or s, path1 is interpreted as
the pathname you want the object to have on the installation machine, and path2 is interpreted as the
pathname the object has on your development machine.
For example, your project might require a development structure that includes a project root directory and
numerous src directories. However, on the installation machine you might want all files to go under a package
root directory and for all src files to be in one directory. So, a file on your machine might be named
/projdir/srcA/filename. If you want that file to be named /pkgroot/src/filename on the installation machine,
your prototype entry for this file might look like this:
f class1 /pkgroot/src/filename=/projdir/srcA/filename
You can use the prototype(F) file to define objects that are not actually delivered on the installation medium.
pkgadd(ADM) creates objects with the following ftypes if they do not already exist at the time of installation:
d (directories)
x (exclusive directories)
l (linked files)
s (symbolically linked files)
p (named pipes)
c (character special device)
b (block special device)
To request that one of these objects be created on the installation machine, add an entry for it in the prototype
file using the appropriate ftype.
For example, if you want a directory created on the installation machine, but do not want to deliver it on the
installation medium, an entry for the directory in the prototype file is sufficient. An entry such as the one
shown below causes the directory to be created on the installation machine, even if it does not exist on the
installation medium.
9. Creating the prototype file
221
The four types of commands that you can put into your prototype(F) file allow you to:
Nest prototype files (the include command).
Define directories for pkgmk(C) to look in when attempting to locate objects as it creates the package
(the search command).
NOTE: This will not work if pkgmk is instructed to compress the package.
Set a default value for mode owner group (the default command). If all or most of your objects have
the same values, using the default command keeps you from having to define these values for every
entry in the prototype file.
Assign a temporary value for variable pathnames to tell pkgmk where to locate these relocatable
objects on your machine (with param="value").
where path is the name of one or more paths to be included in the prototype(F) file. If path is a directory,
entries are created for the contents of that directory as well (everything below that directory).
With this form of the command, all objects are placed into the none class and are assigned the same mode
owner group as exists on your machine. The following example shows pkgproto being executed to create a
file for all objects in the directory /home/pkg:
$
d
f
f
f
f
f
$
pkgproto /home/pkg
none /home/pkg 755 bin bin
none /home/pkg/file1 755 bin
none /home/pkg/file2 755 bin
none /home/pkg/file3 755 bin
none /home/pkg/file4 755 bin
none /home/pkg/file5 755 bin
bin
bin
bin
bin
bin
222
You can use the c class option of pkgproto(C) to assign objects to a class other than none. When using this
option, you can only name one class. To define multiple classes in a prototype(F) file created by pkgproto,
you must edit the file after its creation.
The following example is the same as above except the objects have been assigned to class1:
$
d
f
f
f
f
f
$
bin
bin
bin
bin
bin
Use a path1=path2 format on the pkgproto(C) command line to give an object a different pathname in the
prototype(F) file than it has on your machine. You can, for example, use this format to define relocatable
objects in a prototype file created by pkgproto.
The following example is like the others shown in this section, except that the objects are now defined as bin
(instead of /usr/bin) and are thus relocatable:
$
d
f
f
f
f
f
$
pkgproto(C) detects linked files and creates entries for them in the prototype(F) file. If multiple files are
linked together, it considers the first path encountered the source of the link.
If you have symbolic links established on your machine, but want to generate an entry for that file with an
ftype of f (file), use the i option of pkgproto(C). This option creates a file entry for all symbolic links.
223
You must use the d option to name the device onto which the package should be placed. device can be a
directory pathname or the identifier for a disk. The default device is the installation spool directory.
pkgmk looks for a file named prototype. You can use the f option to specify a package contents file named
something other than prototype. This file must be in the prototype format.
For example, executing pkgmk d diskette1 creates a package based on a file named prototype in your
current directory. The package is formatted and copied to the diskette in the device diskette1.
Package file compression
In Release 4.2, the pkgmk(C) command has been enhanced to optionally compress package files. If the c
option is specified, pkgmk(C) will compress all noninformation files. The following exceptions apply:
If, as a result of compression, the size of the file is not reduced, pkgmk(C) will not compress the file.
If the pathname for the file in the package's prototype(F) file is a relative pathname, (for example,
../mypkg/foo), the file will not be compressed.
224
225
Set packaging
Sets provide a method of grouping packages together as one installable entity. Usually this is used to group
packages that provide a particular feature or set of features. To enable the set capability in SCO OpenServer,
12. Creating a package with pkgtrans
226
Set installation
For sets, a specialpurpose package referred to as a Set Installation Package (SIP) is used. The SIP is used to
control the installation of a set's member packages. The SIP's name and package instance name are always the
same as those used to identify the set itself. For instance, the SIP controlling the installation of the Foundation
Set (fnd) is also named Foundation Set (fnd). A SIP is distinguished from other packages by the
CATEGORY parameter "set" in its pkginfo(F) file and by the presence of a special type of package
information file named setinfo(F). This file is used to convey information about a set's member packages to
the software installation tools.
When pkgadd(ADM) recognizes that a SIP is being processed, it sets up special environment variables and
makes them available to the SIP's procedure scripts. This allows for a welldefined interface between the
scripts and pkgadd(ADM) that enables the SIP scripts to do most of the work when processing set member
package selection and interaction. The SIP's request and preinstall scripts are especially designed to use this
environment.
Among other things, the SIP's request script uses these environment variables to access the setinfo(F) file and
access the set member packages' request and default response files (if any). After the request script has
finished processing, the SIP's preinstall script is then used to pass back to pkgadd(ADM) a list of set member
packages selected for installation as part of the set (see case study 7 for examples of these scripts).
The following is a list of the environment variables made available to a SIP's procedure scripts.
$SETINFO
Used to access the setinfo(F) file.
$REQDIR
Provides the directory where the set member packages' request and default response files, if any,
reside.
$RESPDIR
Contains the name of the directory where processed response files are to be placed. This response file
could be the result of having run a set member package's request script (in the case of custom
installation) or simply a copy of the default response file provided with the SIP (in the case of
automatic installation).
$SETLIST
Used to pass back to pkgadd(ADM) the list of packages selected for installation as part of the set.
After it has processed a SIP, pkgadd adds the set member packages selected (it gets this from the file
represented by $SETLIST in the installation environment) to the list of packages to be installed and proceeds
to install them.
Set installation
227
Set removal
When the package instance specified to the pkgrm(ADM) command is a SIP, pkgrm(ADM) will remove all
of the SIP's set member packages in reverse dependency order (opposite the order in which they were
installed). After all of its member packages have been removed, the SIP itself is removed from the system.
pkg
The short, or abbreviated, name of a package in the set. This name describes which package of the set
requires the amount of space described by the rest of the data on this line in the setsize(F) file.
pathname
Names a directory in which there are objects that will be installed or that will require additional space.
The name can be the mountpoint for a file system. Names that do not begin with a slash (/) indicate
relocatable directories.
blocks
Defines the number of 512byte disk blocks required for installation of the files and directory entries
contained in the pathname. (Do not include filesystemdependent disk usage).
inodes
Defines the number of inodes required for the installation of the files and directory entries contained
in the pathname.
Set removal
228
229
where i tells pkgproto(C) to record symbolic links with an ftype of f (not s), c defines the
class of all objects as class, and path1 defines the object pathname (or names) to be included
in the prototype file. If path1 is a directory, entries for all objects in that directory will be
generated.
Use the path1=path2 format to give an object a different pathname in the prototype(F) file
than it has on your machine. path1 is the pathname where objects can be located on your
machine and path2 is the pathname that should be substituted for those objects.
230
where PARAM is the name of one of the standard installation parameters defined in the pkginfo(F)
manual page and value is the value you assign to it.
You can also define values for your own installation parameters using the same format. Names for
parameters that you create must begin with an uppercase letter and be followed by only lowercase
letters.
The following five parameters are required in every pkginfo file: PKG, NAME, ARCH, VERSION,
and CATEGORY. No other restrictions apply concerning which parameters or how many parameters
you define.
The CLASSES parameter dictates which classes are installed and the order of installation. Although
the parameter is not required, no classes will be installed without it. Even if you have no class action
scripts, the none class must be defined in the CLASSES parameter before objects belonging to that
class will be installed.
3. Execute pkgmk(C).
pkgmk [d device] [r rootpath] [b basedir] [f filename]
where d specifies that the package should be copied onto device, r requests that the root directory
rootpath be used to locate objects on your machine, b requests that basedir be prepended to
relocatable paths when searching for them on your machine, and f names a file, filename, to be used
as your prototype(F) file. (Other options are described in the pkgmk(C) manual page.)
1. Selective installation
This package has three types of objects. The installer chooses which of the three types to install and where to
locate the objects on the installation machine.
231
Approach
To set up selective installation, you must:
1. Define a class for each type of object which can be installed.
In this case study, the three object types are the package executables, the manual pages, and the
emacs executables. Each type has its own class: bin, man, and emacs, respectively. Notice in the
example prototype file that all of the object files belong to one of these three classes.
2. Initialize the CLASSES parameter in the pkginfo(F) file as null.
Normally when you define a class, you want the CLASSES parameter to list all classes that will be
installed. Otherwise, no objects in that class will be installed. For this example, the parameter is
initially set to null. CLASSES will be given values by the request script based on the package pieces
chosen by the installer. This way, CLASSES is set to only those object types that the installer wants
installed. For an example, see the sample pkginfo file associated with this package. Notice that the
CLASSES parameter is set to null.
3. Define object pathnames in the prototype(F) file with variables.
These variables will be set by the request script to the value which the installer provides.
pkgadd(ADM) resolves these variables at installation time and so knows where to install the package.
The three variables used in this example are:
$NCMPBIN (defines location for object executables)
$NCMPMAN (defines location for manual pages)
$EMACS (defines location for emacs executables)
Look at the example prototype file to see how to define the object pathnames with variables.
4. Create a request script to ask the installer which parts of the package should be installed and where
they should be placed.
The request script for this package asks two questions:
Should this part of the package be installed?
When the answer is yes, then the appropriate class name is added to the CLASSES
parameter. For example, when the question ``Should the manual pages associated with this
package be installed'' is answered yes, the class man is added to the CLASSES parameter.
If so, where should that part of the package be placed?
1. Selective installation
232
pkginfo file
PKG='ncmp'
NAME='NCMP Utilities'
CATEGORY='applications,tools'
ARCH='3b2'
VERSION='Release 1.0, Issue 1.0'
CLASSES=''
prototype file
i
i
x
f
f
f
f
x
f
f
f
f
f
f
d
d
f
f
f
pkginfo
request
bin $NCMPBIN 0755 root other
bin $NCMPBIN/dired=/usr/ncmp/bin/dired 0755 root other
bin $NCMPBIN/less=/usr/ncmp/bin/less 0755 root other
bin $NCMPBIN/ttype=/usr/ncmp/bin/ttype 0755 root other
emacs $NCMPBIN/emacs=/usr/ncmp/bin/emacs 0755 root other
emacs $EMACS 0755 root other
emacs $EMACS/ansii=/usr/ncmp/lib/emacs/macros/ansii 0644 root other
emacs $EMACS/box=/usr/ncmp/lib/emacs/macros/box 0644 root other
emacs $EMACS/crypt=/usr/ncmp/lib/emacs/macros/crypt 0644 root other
emacs $EMACS/draw=/usr/ncmp/lib/emacs/macros/draw 0644 root other
emacs $EMACS/mail=/usr/ncmp/lib/emacs/macros/mail 0644 root other
emacs $NCMPMAN/man1/emacs.1=/usr/ncmp/man/man1/emacs.1 0644 root other
man $NCMPMAN 0755 root other
man $NCMPMAN/man1 0755 root other
man $NCMPMAN/man1/dired.1=/usr/ncmp/man/man1/dired.1 0644 root other
man $NCMPMAN/man1/ttype.1=/usr/ncmp/man/man1/ttype.1 0644 root other
man $NCMPMAN/man1/less.1=/usr/ncmp/man/man1/less.1 0644 inixmr other
request script
trap 'exit 3' 15
# determine if and where general executables should be placed ans=`ckyorn d y \ p "Should executables
included in this package be installed" ` || exit $? if [ "$ans" = y ] then CLASSES="$CLASSES bin"
NCMPBIN=`ckpath d /usr/ncmp/bin aoy \ p "Where should executables be installed" ` || exit $? fi
1. Selective installation
233
Approach
To create a database file at the time of installation and save a copy on removal, you must:
1. Create three classes.
This package requires three classes:
the standard class of none (contains a set of processes belonging in the subdirectory bin)
the admin class (contains an executable file config and a directory containing data files)
the cfgdata class (contains a directory)
2. Make the package collectively relocatable.
Notice in the sample prototype(F) file that none of the pathnames begin with a slash or a variable.
This indicates that they are collectively relocatable.
3. Calculate the amount of space the database file will require and create a space(F) file to deliver with
the package. This file notifies pkgadd(ADM) that this package requires extra space and how much
2. Device driver installation
234
pkginfo file
PKG='krazy'
NAME='KrAzY Applications'
CATEGORY='applications'
ARCH='3b2'
VERSION='Version 1'
CLASSES='none cfgdata admin'
prototype file
i
i
i
d
f
f
f
f
d
f
f
pkginfo
i.admin
r.cfgdata
none bin 555 root sys
none bin/process1 555 root other
none bin/process2 555 root other
none bin/process3 555 root other
none bin/config 500 root sys
admin cfg 555 root sys
admin cfg/datafile1 444 root sys
admin cfg/datafile2 444 root sys
235
space file
# extra space required by config data which is
# dynamically loaded onto the system
data 500 1
while read src dest do # the installation service provides '/dev/null' as the # pathname for directories, pipes,
special devices, etc # which it knows how to create [ "$src" = /dev/null ] && continue
cp $src $dest || exit 2 done
# if this is the last time this script will # be executed during the installation, do additional # processing here if
[ "$1" = ENDOFCLASS ] then # our config process will create a data file based on any changes # made by
installing files in this class; make sure # the data file is in class 'cfgdata' so special rules can apply # to it
during package removal installf c cfgdata $PKGINST $BASEDIR/data/config.data f 444 root sys || exit 2
$BASEDIR/bin/config > $BASEDIR/data/config.data || exit 2 installf f c cfgdata $PKGINST || exit 2 fi exit
0
while read path do # pathnames appear in lexical order, thus directories # will appear first; you cannot operate
on directories # until done, so just keep track of names until # later if [ d $path ] then dirlist="$dirlist $path"
continue fi mv $path /tmp || exit 2 done if [ n "$dirlist" ] then rm rf $dirlist || exit 2 fi exit 0
236
Approach
To meet the requirements in the description, you must:
1. Create a copyright(F) file.
A copyright file contains the ASCII text of a copyright message. The message shown in the sample
file will be displayed on the screen during package installation (and also during package removal).
2. Create a compver(F) file.
The sample pkginfo file defines this package version as version 3.0. The sample compver file defines
version 3.0 as being compatible with versions 2.3, 2.2, 2.1, 2.1.1, 2.1.3 and 1.7.
3. Create a depend(F) file.
Files listed in a depend file must already be installed on the system when a package is installed. The
sample file has 11 packages which must already be on the system at installation time.
pkginfo file
PKG='case4'
NAME='Case Study 4'
CATEGORY='application'
ARCH='3b2'
VERSION='Version 3.0'
CLASSES='none'
copyright file
Copyright (c) 1997 The Santa Cruz Operation, Inc.
All Rights Reserved.
compver file
Version
Version
Version
Version
Version
Version
2.3
2.2
2.1
2.1.1
2.1.3
1.7
237
Advanced C Utilities
Issue 4 Version 1
C Programming Language
Issue 4 Version 1
Directory and File Management Utilities
Editing Utilities
Extended Software Generation Utilities
Issue 4 Version 1
Graphics Utilities
Remote Execution Utilities
Software Generation Utilities
Issue 4 Version 1
Shell Programming Utilities
System Header Files
Release 3.1
Approach
To modify /etc/inittab at the time of installation, you must:
1. Add the sed class script to the prototype(F) file.
The name of a script must be the name of the file that will be edited. in this case, the file to be edited
is /etc/inittab and so the sed(C) script is named /etc/inittab. There are no requirements for the mode
owner group of a sed script (represented in the sample prototype by question marks). The file type of
the sed script must be e (indicating that it is editable). For an example, see the "sample prototype file"
.
NOTE: Because the pathname of the sed class action script is exactly the same as the file it is
intended to edit, these two cannot coexist in the same package.
2. Set the CLASSES parameter to include sed.
In the case of the sample pkginfo file, sed is the only class being installed. However, it could be one
of any number of classes.
3. Create a sed class action script.
238
prototype file
i pkginfo
i postinstall
e sed /etc/inittab=/home/mypkg/inittab.sed ? ? ?
!install # remove any previous entry added to the table # for this particular change
/^[^:]*:[^:]*:[^:]*:[^#]*#ROBOT$/d
# add the needed entry at the end of the table; # sed(C) does not properly interpret the '$a' # construct if you
previously deleted the last # line, so the command # $a\ # rb:023456:wait:/usr/robot/bin/setup #ROBOT # will
not work here if the file already contained # the modification. Instead, you will settle for # inserting the entry
before the last line! $i\ rb:023456:wait:/usr/robot/bin/setup #ROBOT
239
Approach
To modify /etc/inittab during installation, you must:
1. Create a class.
Create a class called inittab. You must provide an installation and a removal class action script for this
class. Define the inittab class in the CLASSES parameter in the sample pkginfo file.
2. Create an inittab(F) file.
This file contains the information for the entry that you will add to /etc/inittab. Notice in the sample
prototype file that inittab is a member of the inittab class and has a file type of e for editable. The
sample inittab file upon which this is based is also shown.
3. Create an installation class action script.
Because class action scripts must be multiply executable (you get the same results each time they are
executed), you cannot just add our text to the end of the file. The sample class action script, performs
the following procedures:
checks to see if this entry has been added before
if it has, removes any previous versions of the entry
edits the inittab(F) file and adds the comment lines so you know where the entry is from
moves the temporary file back into /etc/inittab
executes init q when it receives the endofclass indicator
Note that init q can be performed by this installation script. A oneline postinstall script is not needed
by this approach.
4. Create a removal class action script.
240
prototype file
i
i
i
e
pkginfo
i.inittab
r.inittab
inittab /etc/inittab ? ? ?
while read src dest do # remove all entries from the table that are # associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" $dest > /tmp/$$itab || exit 2
sed e "s/$/#$PKGINST" $src >> /tmp/$$itab || exit 2
mv /tmp/$$itab $dest || exit 2 done if [ "$1" = ENDOFCLASS ] then /sbin/init q || exit 2 fi exit 0
while read src dest do # remove all entries from the table that # are associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" $dest > /tmp/$$itab || exit 2
mv /tmp/$$itab $dest || exit 2 done /sbin/init q || exit 2 exit 0
241
a.
Edits /etc/inittab to remove any changes already existing for this package. Notice that the
filename /etc/inittab is hardcoded into the sed command.
b.
If the package is being installed, adds the new line to the end of /etc/inittab. A comment tag is
included in this new entry to remind us from where that entry came.
c.
Executes init q.
This solution addresses the drawbacks in case study 5a and case study 5b. Only one file is needed (beyond the
pkginfo and prototype files), that file is short and simple, it works with multiple instances of a package
because the $PKGINST parameter is used, and no postinstall script is required because init q can be executed
from the build file.
5b. Modify an existing file using a class action script
242
prototype file
i pkginfo
e build /etc/inittab=/home/case5c/inittab.build ? ? ?
# remove all entries from the existing table that # are associated with this PKGINST sed e
"/^[^:]*:[^:]*:[^:]*:[^#]*#$PKGINST$/d" /etc/inittab || exit 2
if [ "$1" = install ] then # add the following entry to the table echo "rb:023456:wait:/usr/robot/bin/setup
#$PKGINST" || exit 2 fi /sbin/init q || exit 2 exit 0
Approach
You could use the build class and follow the approach shown for editing /etc/inittab in case study 5c except
that you want to edit more than one file. If you used the build class approach, you would need to deliver one
for each crontab file edited. Defining a cron class provides a more general approach. To edit a crontab file
with this approach, you must:
1. Define the crontab files that will be edited in the prototype(F) file.
Create an entry in the prototype(F) file for each crontab file which will be edited. Define their class as
cron and their file type as e. Use the actual name of the file to be edited, as shown in the example.
2. Create the crontab files that will be delivered with the package.
243
a.
Calculates the user ID.
This is done by setting the variable user to the base name of the cron class file being
processed. That name equates to the user ID. For example, the basename of
/var/spool/cron/crontabs/root is root (which is also the user ID).
b.
Executes crontab using the user ID and the l option.
Using the l options tells crontab to send the standard output the contents of the crontab for
the defined user.
c.
Pipes the output of the crontab command to a sed(C) script that removes any previous entries
that have been added using this installation technique.
d.
Puts the edited output into a temporary file.
e.
Adds the data file for the root user ID (that was delivered with the package) to the temporary
file and adds a tag so that you will know from where these entries came.
f.
Executes crontab with the same user ID and give it the temporary file as input.
4. Create a removal class action script for the cron class.
The sample removal script is the same as the installation script except that there is no procedure to
add information to the crontab file.
These procedures are performed for every file in the cron class.
pkginfo file
PKG='case6'
NAME='Case Study 6'
CATEGORY='application'
ARCH='3b2'
VERSION='Version 1.0'
CLASSES='cron'
244
pkginfo
i.cron
r.cron
cron /var/spool/cron/crontabs/root ? ? ?
cron /var/spool/cron/crontabs/sys ? ? ?
245
Approach
1. Create a request script script to ask the installer how the set should be installed.
2. Should default installation be performed on this set?
When the answer is yes, if any of the set's member packages require interaction and if default
responses for that interaction have been provided, install the set using the default responses.
When the answer is no, for each package in the set, prompt as to whether this package should be
installed.
When the answer is yes, if it is interactive (the package has a request script), should default
installation of the package be performed?
If yes, use the default response file.
If no, execute the package's request script to obtain the responses.
3. When the request script has completed, the PKGLIST variable should contain the list of selected set
member packages that will be installed on the system. At this time:
pkgadd(ADM) runs the SIP's preinstall script which places the selected set member
packages for installation ($PKGLIST) into the setlist file referenced using the $SETLIST
variable.
pkgadd then reads the setlist file and inserts the packages listed there into the list of packages
to be installed on the system.
As each of these packages is processed, if the package is interactive, pkgadd uses the
response file created earlier so that no prompting for user input occurs except during SIP
processing.
setinfo file
# Format for the setinfo file. Field separator is: <tab>
# pkg
parts default category
package_name
# abbr
y/n
pkgw 1 y system Package W pkgx 1 y system Package X pkgy 2 n system Package Y pkgz 1 y system
Package Y
7a. Create a Set Installation Package
246
while read pkginst parts default category package_name do echo $pkginst >>/tmp/order$$ if [ "$default" =
"y" ] then echo $pkginst >>/tmp/req$$ else echo $pkginst >>/tmp/opt$$ fi done <$SETINFO
REQUIRED=`cat /tmp/req$$ 2>/dev/null` OPTIONAL=`cat /tmp/opt$$ 2>/dev/null` ORDER=`cat
/tmp/order$$ 2>/dev/null` rm f /tmp/opt$$ /tmp/req$$ /tmp/order$$ HELPMSG="Enter 'y' to run default set
installation or enter 'n' to run custom set installation."
PROMPT="Do you want to run default set installation?"
ANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"`|| exit $?
if [ "$ANS" = "y" ] then # Default installation for PKG in $REQUIRED do PKGLIST="$PKGLIST $PKG" if
[ f $REQDIR/$PKG/response ] then cp $REQDIR/$PKG/response $RESPDIR/$PKG fi done echo
"PKGLIST=$PKGLIST" >> $1 else # Custom installation of required packages for PKG in $REQUIRED do
PKGLIST="$PKGLIST $PKG" if [ f $REQDIR/$PKG/request ] then PROMPT="Do you want default
installation for $PKG?" RANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"` || exit $? if [ "$RANS" =
"y" ] then cp $REQDIR/$PKG/request $RESPDIR/$PKG else sh $REQDIR/$PKG/request $RESPDIR/$PKG
fi fi done
# Select which optional packages in set are to be installed for PKG in $OPTIONAL do HELPMSG="Enter 'y'
to install $PKG as part of this set installation or 'n' to skip installation." PROMPT="Do you want to install
$PKG?" PANS=`ckyorn d y p "$PROMPT" h "$HELPMSG"` || exit $?
247
Approach
From the SIP's setinfo file
1. create two separate setinfo files for the two new sets being created
2. create two separate prototype files for the two new sets being created
pkgw 1 y system Package W pkgx 1 y system Package X pkgy 2 n system Package Y pkgz 1 y system
Package Y
248
i pkgw/request=pkgw.request i pkgz/response=pkgz.response
i pkgx/request=pkgx.request i pkgy/response=pkgy.response
249