Sunteți pe pagina 1din 9

###FEATURES COMMON TO BOTH AWK & SED###

1. Both are scripting languages


2. Both work primarily with text files
3. Both are programmable editors
4. Both accept command-line options and can be scripted (-f script_name)
5. Both GNU versions support POSIX (GREP) and EGREP RegExes
6. Lineage = ed (editor) -> sed -> awk

###SED's FEATURES###
1. Non-interactive editor
2. Stream Editor
a. Manipulates input - performing edits as instructed
b. Sed accepts input on/from: STDIN (Keyboard), File, Pipe (|)
3. Sed Loops through ALL input lines of input stream or file, by DEFAULT
4. Does NOT operate on the source file, by default. (Will NOT clobber the original
file, unless instructed to do so)
5. Supports addresses to indicate which lines to operate on: /^$/d - deletes blank
lines
6. Stores active (current) line the 'pattern space' and maintains a 'hold space'
for usage
7. Used primarily to perform Search-and-Replaces

###AWK's FEATURES###
1. Field processor based on whitespace, by default
2. Used for reporting (extracting specific columns) from data feed
3. Supports programming constructs
a. loops (for,while,do)
b. conditions (if,then,else)
c. arrays (lists)
d. functions (string, numeric, user-defined)
4. Automatically tokenizes words in a line for later usage - $1, $2, $3, etc.
(This is based on the current delimiter)
5. Automatically loops through input like Sed, making lines available for
processing
6. Ability to execute shell commands using 'system()' functions

###REGULAR EXPRESSIONS (RegEx) REVIEW###


Regular Expressions (RegExes) are key to mastering Awk & Sed

###METACHARACTERS###
^ - matches the character(s) at the beginning of a line
a. sed -ne '/^dog/p' animals.txt

$ - matches the character(s) at the end of a line


a. sed -ne '/dog$/p' animals.txt

Task: Match line which contains only 'dog':


a. sed -ne '/^dog$/p' animals.txt
b. sed -ne '/^dog$/p' - reads from STDIN. Press Enter after each line. Terminate
with CTRL-D
c. cat animals.txt | sed -ne '/^dog$/p'
d. cat animals.txt | sed -ne '/^dog$/Ip' - prints matches case-insensitively

. - matches any character (typically except new line)


a. sed -ne '/^d...$/Ip' animals.txt
b. sed -ne '/^d.../Ip' animals.txt
###REGEX QUANTIFIERS###
* - 0 or more matches of the previous character
+ - 1 or more matches of the previous character
? = 0 or 1 of the previous character

b. sed -ne '/^d.\+/Ip' animals.txt


Note: Escape quantifiers in RegExes using the escape character '\'

###CHARACTERS CLASSES###
Allows to search for a range of characters
a. [0-9]
b. [a-z][A-Z]

a. sed -ne '/^d.\+[0-9]/Ip' animals.txt

Note: Character Classes match 1, and only 1 character

###INTRO TO SED###
Usage:
1. sed [options] 'instruction' file | PIPE | STDIN
2. sed -e 'instruction1' -e 'instruction2' ...
3. sed -f script_file_name file
Note: Execute Sed by indicating instruction on one of the following:
1. Command-line
2. Script File

Note: Sed accepts instructions based on '/pattern_to_match/action'


###Print Specific Lines of a file###
Note: '-e' is optional if there is only 1 instruction to execute
sed -ne '1p' animals.txt - prints first line of file
sed -ne '2p' animals.txt - prints second line of file
sed -ne '$p' animals.txt - prints last printable line of file
sed -ne '2,4p' animals.txt - prints lines 2-4 from file
sed -ne '1!p' animals.txt - prints ALL lines EXCEPT line #1
sed -ne '1,4!p' animals.txt - prints ALL lines EXCEPT lines 1 - 4
sed -ne '/dog/p' animals.txt - prints ALL lines containing 'dog' - case-sensitive
sed -ne '/dog/Ip' animals.txt - prints ALL lines containing 'dog' - case-
insensitive
sed -ne '/[0-9]/p' animals.txt - prints ALL lines with AT LEAST 1 numeric
sed -ne '/cat/,/deer/p' animals.txt - prints ALL lines beginning with 'cat',
ending with 'deer'
sed -ne '/deer/,+2p' animals.txt - prints the line with 'deer' plus 2 extra lines

###Delete Lines using Sed Addresses###


sed -e '/^$/d' animals.txt - deletes blank lines from file
Note: Drop '-n' to see the new output when deleting

sed -e '1d' animals.txt - deletes the first line from animals.txt


sed -e '1,4d' animals.txt - deletes lines 1-4 from animals.txt
sed -e '1~2d' animals.txt - deletes every 2nd line beginning with line 1 -
1,3,5...

###Save Sed's Changes using Output Redirection###

sed -e '/^$/d' animals.txt > animals2.txt - deletes blank lines from file and
creates new output file 'animals2.txt'
###SEARCH & REPLACE USING Sed###
General Usage:
sed -e 's/find/replace/g' animals.txt - replaces 'find' with 'replace'
Note: Left Hand Side (LHS) supports literals and RegExes
Note: Right Hand Side (RHS) supports literals and back references

Examples:
sed -e 's/LinuxCBT/UnixCBT/' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT
sed -e 's/LinuxCBT/UnixCBT/I' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT (Case-Insensitive)

Note: Replacements occur on the FIRST match, unless 'g' is appended to the
s/find/replace/g sequence
sed -e 's/LinuxCBT/UnixCBT/Ig' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to
STDOUT (Case-Insensitive & Global)

Task:
1. Remove ALL blank lines
2. Substitute 'cat', regardless of case, with 'Tiger'

Note: Whenver using '-n' option, you MUST specify the print modifier 'p'
sed -ne '/^$/d' -e 's/Cat/Tiger/Ig' animals.txt - removes blank lines &
substitutes 'cat' with 'Tiger'
OR sed -e '/^$/d; s/Cat/Tiger/Igp' animals.txt - does the same as above
Note: Simply separate multiple commands with semicolons

###Update Source File - Backup Source File###


sed -i.bak -e '/^$/d; s/Cat/Tiger/Igp' animals.txt - performs as above, but ALSO
replaces the source file and backs it up

###Search & Replace (Text Substitution) Continued###


sed -e '/address/s/find/replace/g' file
sed -e '/Tiger/s/dog/mutt/g' animals.txt
sed -ne '/Tiger/s/dog/mutt/gp' animals.txt - substitutes 'dog' with 'mutt' where
line contains 'Tiger'
sed -e '/Tiger/s/dog/mutt/gI' animals.txt
sed -e '/^Tiger/s/dog/mutt/gI' animals.txt - Updates lines that begin with 'Tiger'
sed -e '/^Tiger/Is/dog/mutt/gI' animals.txt - Updates lines that begin with 'Tiger'
(Case-Insensitive)

###Focus on the Right Hand Side (RHS) of Search & Replace Functions in SED###
Note: SED reserves a few characters to help with substitutions based on the matched
pattern from the LHS
& = The full value of the LHS (Pattern Matched) OR the values in the pattern space

Task:
Intersperse each line with the word 'Animal '
sed -ne 's/.*/&/p' animals.txt - replaces the matched pattern with the matched
pattern
sed -ne 's/.*/Animal &/p' animals.txt - Intersperses 'Animal' on each line
sed -ne 's/.*/Animal: &/p' animals.txt - Intersperses 'Animal' on each line

sed -ne 's/.*[0-9]/&/p' animals.txt - returns animals with at least 1 numeric at


the end of the name
sed -ne 's/.*[0-9]\{1\}/&/p' animals.txt - returns animals with only 1 numeric at
the end of the name
sed -ne 's/[a-z][0-9]\{4\}$/&/pI' animals.txt - returns animal(s) with 4 numeric
values at the end of the name
sed -ne 's/[a-z][0-9]\{1,4\}$/&/pI' animals.txt - returns animal(s) with at least
1, up to 4 numeric values at the end of the name

###Grouping & Backreferences###


Note: Segement matches into backreferences using escaped parenthesis: \(RegEx\)
sed -ne 's/\(.*\)\([0-9]\)/&/p' animals.txt - This creates 2 variables: \1 & \2
sed -ne 's/\(.*\)\([0-9]\)$/\1/p' animals.txt - This creates 2 variables: \1 & \2
but references \1
sed -ne 's/\(.*\)\([0-9]\)$/\2/p' animals.txt - This creates 2 variables: \1 & \2
but references \2
sed -ne 's/\(.*\)\([0-9]\)$/\1 \2/p' animals.txt - This creates 2 variables: \1 & \
2 but references \1 & \2

###Apply Changes to Multiple Files###


Sed Supports Globbing via wildcards: *, ?
sed -ne 's/\(.*\)\([0-9]\)$/\1 \2/p' animals*txt - This creates 2 variables: \1 & \
2 but references \1 & \2

###Sed Scripts###
Note: Sed supports scripting, which means, the ability to dump 1 or more
instructions into 1 file

sed -f script_file_name text_file

sed -f animals.sed animals.txt

Task:
Perform multiple transformations on animals.txt file
1. /^$/d - Removes blank lines
2. s/dog/frog/Ig - substitutes globally, 'dog' with 'frog' - (case-insensitive)
3. s/tiger/lion/Ig - substitute globally, 'tiger' with 'lion' - (case-insensitive)
4. s/.*/Animals: &/ - Interspersed 'Animals:'
5. s/animals/mammals/Ig - Replaced 'Animals' with 'mammals'
6. s/\([a-z]*\)\([0-9]*\)/\1/Ip - Strips trailing numeric values from alphas

Sed Scripting Rules:


1. Sed applies ALL rules to each line
2. Sed applies ALL changes dynamically to the pattern space
3. Sed ALWAYS works with the current line

###Awk - Intro###
Features:
1. Reporter
2. Field Processor
3. Supports Scripting
4. Programming Constructs
5. Default delimiter is whitespace
6. Supports: Pipes, Files, and STDIN as sources of input
7. Automatically tokenizes processed columns/fields into the variables: $1, $2, $3
.. $n
8. Supports GREP and EGREP RegExes

Usage:
awk '{instructions}' file(s)
awk '/pattern/ { procedure }' file
awk -f script_file file(s)
Tasks:
Note: $0 represents the current record or row
1. Print entire row, one at a time, from an input file (animals.txt)
a. awk '{ print $0 }' animals.txt

2. Print specific columns from (animals.txt)


a. awk '{ print $1 }' animals.txt - this prints the 1st column from the file

3. Print multiple columns from (animals.txt)


a. awk '{ print $1; print $2; }' animals.txt
b. awk '{ print $1,$2; }' animals.txt

4. Print columns from lines containing 'deer' using RegEx Support


a. awk '/deer/ { print $0 }' animals.txt

5. Print columns from lines containing digits


a. awk '/[0-9]/ { print $0 }' animals.txt

6. Remove blank lines with Sed and pipe output to awk for processing
a. sed -e /^$/d animals.txt | awk '/^[0-9]*$/ { print $0 }'

7. Print blank lines


a. awk '/^$/ { print }' animals.txt OR
b. awk '/^$/ { print $0 }' animals.txt

8. Print ALL lines beginning with the animal 'dog' case-insensitve


a. awk '/dog/I { print }' animals.txt

###Delimiters###
Default delimiter: whitespace (space, tabs)
Use: '-F' to influence the default delimiter

Task:

1. Parse /etc/passwd using awk


a. awk -F: ' { print } ' /etc/passwd
b. awk -F: ' { print $1, $5 } ' /etc/passwd

2. Support for character classes in setting the default delimiter


a. awk -F"[:;,\t]"

###Awk Scripts###
Features:
1. Ability to organize patterns and procedures into a script file
2. The patterns/procedures are much neater and easier to read
3. Less information is placed on the command-line
4. By default, loops through lines of input from various sources: STDIN, Pipe,
files
5. # is the default comment character
6. Able to perform matches based on specific fields

Awk Scripts consist of 3 parts:


1. Before (denoted using: BEGIN) - Executed prior to FIRST line of input being
read
2. During (Main Awk loop) - Focuses on looping through lines of input
3. After (denoted using: END) - Executed after the LAST line of input has been
processed
Note: BEGIN and END components of Awk scripts are OPTIONAL

Tasks:
1. Print to the screen some useful information without reading input (STDIN, Pipe,
or File)
a. awk 'BEGIN { print "Testing Awk without input file" } '

2. Set system variable: FS to colon in BEGIN block


a. awk 'BEGIN { FS = ":" ; print "Testing Awk without input file" }'
b. awk 'BEGIN { FS = ":" ; print FS }'

3. Write script to extract rows which contain 'deer' from animals.txt using RegEx
a. awk -f animals.awk animals.txt

4. Parse /etc/passwd
a. print entire lines - { print }
b. print specific columns - { print $1, $5 }
c. print specific columns for a specific user - /linuxcbt/ { print $1, $5 }
d. print specific columns for a specific user matching a given column - $1 ~
/linuxcbt/ { print $1, $5 }
e. test column #7 for the string 'bash' - $7 ~ /bash/ { print }

###Awk Variables###
Features 3 Types of variables:
1. System - i.e. FILENAME, RS, ORS...
2. Scalars - i.e. a = 3
3. Arrays - i.e. variable_name[n]

Note: Variables do not need to be declared. Awk, automatically registers them in


memory
Note: Variable names ARE case-sensitive

System Variables:
1. FILENAME - name of current input file
2. FNR - used when multiple input files are used
3. FS - field separator - defaults to whitespace - can be a single character,
including via a RegEx
4. OFS - output field separator - defaults to whitespace
5. NF - number of fields in the current record
6. NR - current record number (it is auto-summed when referenced in END section)
7. RS - record separator - defaults to a newline
8. ORS - output record separator - defaults to a newline
9. ARGV - array of command-line argurments - indexed at 0, beginning with $1
10. ARGC - total # of command-line arguments
11. ENVIRON - array of environment variables for the current user

Tasks:
1. print key system variables
a. print FILENAME (print anywhere after the BEGIN block)
b. print NF - number of fields per record
c. print NR - current record number
d. print ARGC - returns total number of command-line arguments

Scalar Variables:
variable_name = value
age = 50

Note: Set scalars in the BEGIN section, however, they can be, if required, set in
the main loop

{ ++age } - increments variable 'age' by 1, for each iteration of the main loop
(component 2 of 3)

Set variable to string using double quotes:


fullname = "Dean Davis"

Concatenate variables by separating the values using a space


fullname = "Dean" "Davis"

Array Variables:
Feature:
1. List of information

Task:
1. Define an array variable to store various ages
a. age[0] = 50
2. Use split function to auto-build an array
a. arr1num = split(string, array, separator)

###Operators###
Features:
1. Provides comparison tools for expressions
2. Generally 2 types:
a. Relational - ==, !=, <, >, <=, >=, ~ (RegEx Matches), !~ (RegEx Does NOT
Match)
b. Boolean - ||(OR), &&(AND), !(NOT) - Combines comparisons

3. Print something if the current record number is > 10


a. { NR > 10 { print "Current Record Number is greater than 10" }
4. Extract records with ONLY 2 fields
a. NF == 2 { print }

4. Find records that have at least 2 fields and are positioned at record 5 or
higher
a. NF >= 2 && NR >=5 { print }

###Loops###
Features:
1. Support for: while, do, and for

While:
{ while (NR > 10) print "Greater than 10" }

For:
for(i=1; i <=10; ++i) print i

Do - performs the action carried-out by while at least once:


do action while (condition)

###Processing Records with Awk###


Task:
1. Process multiple delimiters in the same file (across records)
a. awk -F "[:; ]" '{ print }' animals2.txt
b. awk 'BEGIN { FS="[ ;:]" }; { print $2 }' animals2.txt
c. awk -f script_name animals2.txt
2. Process multiple delimiters on the same line
a. Note: Script does NOT change, however, input file DOES
3. Normalize the Output Field Separator (OFS)
BEGIN { OFS=":" }

4. Build animalclasses array from the list of classes in animals2.txt


a. { animalclass[NR] = $2 } - place in main loop - builds animalclass array

5. Extract Daemon entries from /var/log/messages


a. extract kernel messages
a1. awk -f test.awk /var/log/messages
b1. awk -f ~linuxcbt/test.awk messages | awk '$8 ~ /error/ { print
$5,$6,$7,$8,$9 }'
c1. awk -f ~linuxcbt/test.awk messages | awk 'BEGIN { print "HERE ARE THE
ERROR MESSAGES"}; $8 ~ /error/ { print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 };
END { print "Process Complete" }'

###Printf Formatting###
Feature:
1. Ability to control the width of fields in the output

Usage:
printf("format", arguments)
Supported Printf Formats include:
1. %c - ASCII Characters
2. %d - Decimals - NOT floating point values OR values to the right of the
decimal point
3. %f - Floating Point
4. %s - Strings
Note: printf does NO print newline character(s)
This means you'll need to indicate a newline character sequence: \n - in the
"format" section of the printf function

Note: Default output is right-justified. Use '-' to indicate left-justification


General format section:
[-]width.precision[cdfs]
width - influences the actual width of the column to be output
precision - influences the number of places to the right of the decimal point
precision - ALSO influences the number of strings to be printed from a string

Examples | Tasks:
1. print "Testing printf" from the command-line
a. awk 'BEGIN { printf("Testing printf\n") }'

2. read 'animals.txt' and format the output with printf


a. awk 'BEGIN { printf("Here is the output\n")} { printf("%s\t%s\n", $1,$2) }'
animals.txt

3. Apply width and precision to task #2


a. awk 'BEGIN { printf("Here is the output\n")} { printf("%.4s\t%.4s\n",
$1,$2) }' animals.txt
b. awk 'BEGIN { printf("Here is the output\n")} { printf("%20s\t%20s\n",
$1,$2) }' animals.txt

4. Left-justify task #3
a. awk 'BEGIN { printf("Here is the output\n")} { printf("%-20s\t%-20s\n", $1,$2)
}' animals.txt
5. Parse animals_with_prices.txt file and properly represent strings, decimals and
floating point values
a. awk 'BEGIN { printf("Here is the output\n\n")} { printf("%-5s\t$%.2f\n",
$1,$2) }' animals_with_prices.txt

6. Format using printf animals2.txt


a. for (i=1; i <= NR; i++)
printf("%-12s %1d %-2s %-10s\n", "Animal Class", i, ": ", animalclass[i])

7. Apply upper and lower-case formatting to printf values


a. printf("%-12s %1d %-2s %-10s\n", "ANIMAL CLASS", i, ": ",
toupper(animalclass[i]))
b. printf("%-12s %1d %-2s %-10s\n", "ANIMAL CLASS", i, ": ",
tolower(animalclass[i]))

8. Format output from /var/log/messages


a. Extract date, time, server and daemon columns, include a header
{ BEGIN

###Additional Sed & Awk Examples###


Task:
1. Update PHP web pages to remove 'Shipping: Free' wherever it exists
a. Code to remove: <b>Shipping</b>:&nbsp;Free<br>
sed -i.bak -e 's/<b>Shipping<\/b>:&nbsp;Free<br>//'
products_linuxcbt_security_edition.php

b. Effect the change to ALL product files and create .new output files without
clobbering the source files
for i in `ls -A products_*php`; do sed -e 's/<b>Shipping<\/b>:&nbsp;Free<br>//'
$i > $i.new; done

2. Strip '.new' suffix from newly generated files


a. echo "products_linuxcbt.php.new" | sed -e 's/\.new//'
b. for i in `ls -A products_*new | sed -e 's/\.new//'`; do echo $i; done
c. for i in `ls -A products_*new | sed -e 's/\.new//'`; do mv $i.new $i; done

3. Remove 'Free Shipping' from faq.php file


a. Code to remove: <li>Free Shipping
b. sed -e 's/<li>Free Shipping//' faq.php > faq.php.new

Use Awk & Sed Together to update specific rows in /var/log/messages:


Task:
a. Update Month information for kernel messages for September 3
awk '$1 ~ /Sep/ && $2 ~ /3/ && $5 ~ /kernel/ { print }' /var/log/messages
b. awk '$1 ~ /Sep/ && $2 ~ /3/ && $5 ~ /kernel/ { ++total } { print } END { print
"Total Records Updated: " total }' /var/log/messages | sed -ne 's/Sep/September/p'

###Windows Support for GNU Sed & Awk###


Download GNU Sed & Awk from: http://gnuwin32.sourceforg.net

Windows Stuff:
gawk "BEGIN { max=ARGV[1]; for (i=1;i<=max;++i) print i }" 10 - reads 10 from
ARGV[1] and passes it to 'max' var for use in the 'for' loop

S-ar putea să vă placă și