Awk is a little language designed for the processing of lines of text. It is available on every Unix (since V3) or linux system. The name is an acronym of the names of its creators: Aho, Weinberger and Kernighan.
Since I spent a couple of minutes to learn awk I have found it quite useful during my daily work. It is my favorite tool in the base set of Unix tools due to its simplicity and versatility.
Typical use cases for awk scripts are log file analysis and the processing of character separated value (CSV) formats. Awk allows you to easily filter, transform and aggregate lines of text.
The idea of awk is very simple. An awk script consists of a number of patterns, each associated with a block of code that gets executed for an input line if the pattern matches:
pattern_1 {# code to execute if pattern matches line
}
pattern_2 {
# code to execute if pattern matches line
}
# ...
pattern_n {
# code to execute if pattern matches line
} Patterns and blocks
The patterns are usually regular expressions:
/error|warning/ {# executed for each line, which contains
# the word "error" or "warning"
}
/^Exception/ {
# executed for each line starting
# with "Exception"
}
There are some special patterns, namely the empty pattern, which matchesevery line …
{# executed for every line
}
… and the BEGIN and END patterns. Their blocks are executed before andafter the processing of the input, respectively:
BEGIN {# executed before any input is processed,
# often used to initialize variables
}
END {
# executed after all input has been processed,
# often used to output an aggregation of
# collected values or a summary
} Output and variables
The most common operation within a block is the print statement. The following awk script outputs each line containing the string “error”:
/error/ { print }This is basically the functionality of the Unix grep command, which is filtering. It gets more interesting with variables. Awk provides a couple of useful built-in variables. Here are some of them:
$0 represents the entire current line $1 … $ n represent the 1…n-th field of the current line NF holds the number of fields in the current line NR holds the number of the current line (“record”)By default awk interprets whitespace sequences (spaces and tabs) as field separators. However, this can be changed by setting the FS variable (“field separator”).
The following script outputs the second field for each line:
{ print $2 }Input:
John 32 maleJane 45 female
Richard 73 male
Output:
And this script calculates the sum and the average of the second fields:
{sum += $2
}
END {
print "sum: " sum ", average: " sum/NR
}
Output:
sum: 150, average: 50 The languageThe language that can be used within a block of code is based on C syntax without types and is very similar to javascript. All the familiar control structures like if/else, for, while, do and operators like =, ==, >, &&, ||, ++, +=, … are there.
Semicolons at the end of statements are optional, like in JavaScript. Comments start with a # , not with // .
Variables do not have to be declared before usage (no ‘var’ or type). You can simply assign a value to a variable and it comes into existence.
String concatenation does not have an explicit operator like “+”. Strings and variables are concatenated by placing them next to each other:
"Hello " name ", how are you?"# This is wrong: "Hello" + name + ", how are you?"
print is a statement, not a function. Parentheses around its parameter list are optional.
FunctionsAwk provides a small set of built-in functions. Some of them are:
length(string), substr(string, index, count), index(string, substring), tolower(string), toupper(string), match(string, regexp) .
User-defined functions look like JavaScript functions:
function min(number1, number2) {if (number1 < number2) {
return number1
}
return number2
}
In fact, JavaScript adopted the function keyword from awk. User-defined functions can be placed outside of pattern blocks.
Command-line invocationAn awk script can be either read from a script file with the -f option:
$ awk -f myscript.awk data.txt… or it can be supplied in-line within single quotes:
$ awk '{sum+=$2} END {print "sum: " sum " avg: " sum/NR}' data.txt ConclusionI hope this short introduction helped you add awk to your toolbox if you weren’t familiar with awk yet. Awk is a neat alternative to full-blown scripting languages like python and Perl for simple text processing tasks.