AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) ------------------------------------------------------------------------------- awk, nawk, oawk PURPOSE Finds lines in files matching specified patterns and performs specified actions on them. SYNTAX +-----------+ +----------+ +----------+ +-'-| 2|---| 3|-'-+ awk ---| |---| ^+- pattern -+ +- action -+| |---> | 1| | +----------------------------+ | +- -Fchar -+ +----------- -f progfile ----------+ +------------------+ +--------+ >---| |---| |---| | | | | +- variable=value -+ +- file -+ ^ | ^ | +----------------+ +------+ ----------------- 1 The default char is a tab. 2 The default pattern is every line. 3 The default action is to print the line. DESCRIPTION The awk command is a more powerful pattern matching command than the grep command. It can perform limited processing on the input lines, instead of simply displaying lines that match. Some of the features of awk are: o It performs convenient numeric processing. o It allows variables within actions. o It allows general selection of patterns. o It allows control flow in the actions. o It does not require any compiling of programs. Version 1.2.1 provides an enhanced version of the nawk (for "new awk"). This new version is similar to that provided by AIX Version 3.1 and recent releases of AT&T UNIX System V. The AIX 1.2 awk command is being provided as oawk (for "old awk"). The awk command is linked to oawk for backwards compatability with AIX 1.2. The nawk command provides enhanced handling of files and pipes and better error messages for debugging. The nawk command also supports the Japanese language, which uses the multi-byte character set (MBCS) facilities of AIX 1.2.1. The oawk (awk) command has not been enhanced to support MBCS pattern matching. If this functionality is required, use nawk. Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 1 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) Differences between oawk and nawk are noted in the text where applicable. Except where noted, awk implies oawk and nawk in this document. For a detailed discussion of awk, see AIX Operating System Programming Tools and Interfaces. The awk command, reads files in the order stated on the command line. If you specify a file name as - (minus) or do not specify a file name, awk reads standard input. The awk command searches its input line by line for patterns. When it finds a match, it performs the associated action and writes the result to standard output. In nawk, the pattern can contain Japanese characters. Enclose pattern-action statements on the command line in single quotation marks to protect them from interpretation by the shell. The awk command first reads all pattern-action statements, then it reads a line of input and compares it to each pattern, performing the associated actions on each match. When it has compared all patterns to the input line, it reads the next line. The awk command treats input lines as fields separated by spaces, tabs, or a field separator you set with the FS variable. Fields are referenced as $1, $2, and so on. $0 refers to the entire line. On the awk command line, you can assign values to variables as follows: variable=value Pattern-Matching Statements Pattern-matching statements follow the form: pattern { action } If a pattern lacks a corresponding action, awk writes the entire line that contains the pattern to standard output. If an action lacks a corresponding pattern, it matches every line. ACTIONS: An action is a sequence of statements that follow C Language syntax. These statements can include: statement format if if ( conditional ) statement [ else statement ] while while ( conditional ) statement for for ( expression ; conditional ; expression ) statement for for (variable in array) statement1 break break continue continue Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 2 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) close (filename), close (command) break connection between print and filename or command (nawk only)2 (assignment) variable = expression print print [expression-list] [>expression] printf printf format[, expression-list] [>expression] (oawk) printf printf format[, expression | 77 expression | | command] (nawk)3 next next exit exit [expression]6 (compound statement) {statement...} ---------- 1 variable may contain Japanese characters in nawk only. 2 filename and command may contain Japanese characters in nawk only. 3 format, expression-list, and command may contain Japanese characters in nawk only. 6 expression may contain Japanese characters in nawk only. Statements can end with a semicolon, a new-line character , or the right brace enclosing the action. If you do not supply an action, awk displays the whole line. Expressions can have string or numeric values and are built using the operators "+", "-", "*", "/", "%", a blank for string concatenation, and the C operators "++", "--", "+=", "-=", "*=", "/=", and "%=". In statements, variables may be scalars, array elements (denoted x[i]), or fields. Variable names can contain uppercase and lowercase alphabetic letters, underscores, and digits (0-9). nawk variable names may contain Japanese characters. Variable names cannot begin with a digit. Variables are initialized to the null string. Array subscripts may be any string; they do not have to be numeric. This allows for a form of associative memory. String constants in expressions should be enclosed in double quotation marks. There are several variables with special meaning to awk. They include: ARGC Number of command-line arguments. ARGV Array of command-line arguments. argv may contain Japanese characters in nawk only. FILENAME The name of the current input file. In nawk only, may contain Japanese characters. FNR Record number in current file. FS Input field separator (default is a blank). This separator must be an ASCII character, in oawk, but may contain Japanese characters in nawk. NF The number of fields in the current input line (record). NR The number of the current input line (record). OFMT The output format for numbers (default "%.6g"). Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 3 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) OFS The output field separator (default is a blank). This separator must be an ASCII character, in oawk, but may contain Japanese characters in nawk. ORS The output record separator (default is a new-line character). This separator must be an ASCII character, in oawk, but may contain Japanese characters in nawk. RLENGTH Length of string matched by match function. RS Controls the input record separator (default is \n). RSTART Start of string matched by match function. SUBSEP Subscript separator (default is \034). Since the actions process fields, input blanks or white space are not preserved on the output. The printf statement formats its expression list according to the format of the printf subroutine (see AIX Operating System Technical Reference), and writes it arguments to standard output, separated by the output field separator and terminated by the output record separator. You can redirect the output using the print > "filename" or printf > "filename" statements. An empty expression list stands for the whole line. Use any of the following to redirect the output of a print statement to a file named "myfile": awk '{print > myfile}' awk '{printf > myfile}' awk '{BEGIN {filename = "myfile"} {print > filename}' You have two ways to designate a character other than white space to separate fields. You can use the -Fc flag on the awk command line, or you can start progfile with: BEGIN { FS = c } Either action changes the field separator to c. There are several built-in functions that can be used in awk actions. atan2(y,x) Takes arctangent of y/x in the range -r to r (nawk only). cos(x) Takes cosine of x, with x in radians (nawk only). exp(n) Takes the exponential of its argument. getline Reads the next line of standard input (oawk). The nawk version can also read from a pipe or an input file. An optional var parameter will store the input (nawk only). gsub(r,s) Substitute s for r globally in $0, return number of substitutions made (nawk only). index(s,t) Return first position of string t in s, or 0 if t is not present. Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 4 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) int(n) Takes the integer part of its argument. length Returns the length of the whole line if there is no argument or the length of its argument taken as a string. log(n) Takes the base e logarithm of its argument. log(x) Takes natural (base e) logarithm of x (nawk only). match(s,r) Test whether s contains a substring matched by r, return index or 0; sets RSTART and RLENGTH (nawk only). n=split(s,array,sep) Splits string s into array [1] ...array [n] and returns number of elements. If present, sep is the field separator; otherwise, the variable FS is used. rand( ) Returns random number r, where 0 ", "<=", ">=", "==", and "!=". A conditional can be an arithmetic expression, a relational expression, or a Boolean combination of these. You can use the special patterns BEGIN and END to capture control before the first and after the last input line is read, respectively. BEGIN may only be the first pattern in profile, and END may only be the last pattern. There are no explicit conversions between numbers and strings. To force an expression to be treated as a number, add "0" to it. To force it to be treated as a string, append a null string (""). nawk User-Defined Functions A nawk program can contain user-defined functions. Such a function is defined by a statement of the form function name(parameter-list) { statements } A function definition can occur anywhere a pattern-action statement can. Thus, the general form of a nawk program is a sequence of pattern-action statements and function definitions separated by newlines or semicolons. In a function definition, newlines are optional after the left brace and before the right brace of the function body. The parameter list is a sequence of variable names separated by commas; within the body of the function these variables refer to the arguments with which the function was called. The body of a function definition may contain a "return" statement that returns control and perhaps a value to the caller. It has the form return expression The expression is optional, and so is the "return" statement itself, but the returned value is undefined if none is provided or if the last statement executed is not a "return". For example, a function "max" might be called like this: { print max($1,max($2,$3)) } # print maximum of $1, $2, $3 function max(m, n) { return m > n ? m : n } The variables "m" and "n" belong to the function "max"; they are unrelated to any other variables elsewhere in the program. Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 6 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) Updating oawk scripts to work using nawk Note this example if you plan to convert your AIX 1.2 oawk scripts to run with nawk: With AIX 1.2 awk or AIX 1.2.1 oawk you could write the following awk program: { LINE = $0 /* set line equal to input string */ print $LINE /* print out input */ } Note: $ is only used with indirectly referenced variables. With AIX 1.2.1 nawk, $ is only used with a number or a variable whose value is a number, so you would need to write: { LINE = $0 /* set line equal to input string */ print LINE /* print out input */ } Notes: Do not use a $ in front of a variable unless that variable has a value which is a number. For example, if var="1" then $var is really $1, which is the first word on the input line. However, if var="a" then $var is $(a), which is not a valid field. FLAGS -f progfile Searches for the patterns and perform the actions found in the file progfile. -Fchar Uses char as the field separator character (by default a blank). EXAMPLES 1. To display the lines of a file that are longer than 72 characters: awk "length >72" chapter1 This selects each line of the file "chapter1" that is longer than 72 characters. awk then writes these lines to standard output because no action is specified. 2. To display all lines between the words "start" and "stop": awk "/start/,/stop/" chapter1 3. To run an awk program ("sum2.awk") that processes a file ("chapter1"): awk -f sum2.awk chapter1 Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 7 AWK, NAWK, OAWK(1,C) AIX Commands Reference AWK, NAWK, OAWK(1,C) The following awk program computes the sum and average of the numbers in the second column of the input file: { sum += $2 } END { print "Sum: ", sum; print "Average:", sum/NR; } The first action adds the value of the second field of each line to the variable "sum". awk initializes "sum" (and all variables) to zero before starting. The keyword END before the second action causes awk to perform that action after all of the input file has been read. The variable NR, which is used to calculate the average, is a special variable containing the number of records (lines) that have been read. 4. To print the names of the users who have the C shell as the initial shell: awk -F: '/csh/{print $1}' /etc/passwd 5. To send the output of a print statement to the more command: awk '{print | "more"}' chapter1 6. To determine the correct number of characters, words and lines in chapter1: awk '{print | "wc"}' chapter1 7. To print the second line of chapter1: awk '{getline; print; exit}' chapter1 RELATED INFORMATION See the following commands: "lex," "grep, egrep, fgrep" and "sed." See the printf subroutine in AIX Operating System Technical Reference. See "Introduction to International Character Support" in Managing the AIX Operating System. See the discussion of awk and oawk in AIX Operating System Programming Tools and Interfaces. Processed October 11, 1991 AWK, NAWK, OAWK(1,C) 8