Using Awk to beautify grep searches

Recently we've seen a sprout of re-implementations of many popular Unix tools. With the expansion of communities built around new languages or platforms, it seems that apart from the novelties in technologies ― the ideas on how to use them stay the same. There are more and more solutions to the same kinds of problems:

text editors CSS pre-processors find-in-files tools screen scraping tools ... many more ...

In this blog post I'd like to tackle the problem from yet another perspective. Instead of resolving to "new and cool" libraries and languages (grep implemented in X language) ― I'd like to use what's out there already in terms of tooling to build a nice search-in-files tool for myself.

Search in files tools

It seems that for many people it's very important to have a "search in files" tool that they really like. Some of the nice work we've seen so far include:

ack ripgrep the_silver_searcher

These are certainly very nice. As the goal of this post is to build something out of the tooling found in any minimal Unix-like installation ― they won't work though. They either need to be compiled or require Perl to be installed which isn't everywhere (e. g. FreeBSD on default ― though obviously available via the ports).

What I really need from the tool

I do understand that for some developers, waiting 100 ms longer for the search results might be too long. I'm not like that though. Personally, all I care about when searching is how the results are being presented. I also like to have the consistency of using the same approach between many machines I work on. We're often working on remote machines at End Point. The need to install e.g Rust compiler just to get the ripgrep tool is too time consuming and hence doesn't contribute to getting things done faster. Same goes for e. g the_silver_searcher which needs to be compiled too. What options do I have then?

Using good old Unix tools

The "find in files" functionality is covered fully by the Unix grep tool. It allows searching for a given substring but also "Regex" matches. The output can not only contain only the lines with matches, but also the lines before and after to give some context. The tool can provide line numbers and also search recursively within directories.

While I'm not into speeding it up, I'd certainly love to play with its output because I do care about my brain's ability to parse text and hence: be more productive.

The usual output of grep:

$ # searching inside of the ripgrep repo sources: $ egrep -nR Option src (...) src/search_stream.rs:46: fn cause(&self) -> Option<&StdError> { src/search_stream.rs:64: opts: Options, src/search_stream.rs:71: line_count: Option<u64>, src/search_stream.rs:78:/// Options for configuring search. src/search_stream.rs:80:pub struct Options { src/search_stream.rs:89: pub max_count: Option<u64>, src/search_stream.rs:94:impl Default for Options { src/search_stream.rs:95: fn default() -> Options { src/search_stream.rs:96: Options { src/search_stream.rs:113:impl Options { src/search_stream.rs:160: opts: Options::default(), src/search_stream.rs:236: pub fn max_count(mut self, count: Option<u64>) -> Self { src/search_stream.rs:674: pub fn next(&mut self, buf: &[u8]) -> Option<(usize, usize)> { src/worker.rs:24: opts: Options, src/worker.rs:28:struct Options { src/worker.rs:38: max_count: Option<u64>, src/worker.rs:44:impl Default for Options { src/worker.rs:45: fn default() -> Options { src/worker.rs:46: Options { src/worker.rs:72: opts: Options::default(), src/worker.rs:148: pub fn max_count(mut self, count: Option<u64>) -> Self { src/worker.rs:186: opts: Options, (...)

What my eyes would like to see is more like the following:

$ mygrep Option src (...) src/search_stream.rs: 46 fn cause(&self) -> Option<&StdError> { 64 opts: Options, 71 line_count: Option<u64>, 78 /// Options for configuring search. 80 pub struct Options { 89 pub max_count: Option<u64>, 94 impl Default for Options { 95 fn default() -> Options { 96 Options { 113 impl Options { 160 opts: Options::default(), 236 pub fn max_count(mut self, count: Option<u64>) -> Self { 674 pub fn next(&mut self, buf: &[u8]) -> Option<(usize, usize)> { src/worker.rs: 24 opts: Options, 28 struct Options { 38 max_count: Option<u64>, 44 impl Default for Options { 45 fn default() -> Options { 46 Options { 72 opts: Options::default(), 148 pub fn max_count(mut self, count: Option<u64>) -> Self { 186 opts: Options, (...)

Fortunately, even the tiniest of Unix like system installation already has all we need to make it happen without the need to install anything else. Let's take a look at how we can modify the output of grep with awk to achieve what we need.

Piping into awk

Awk has been in Unix systems for many years ― it's older than me! It is a programming language interpreter designed specifically to work with text. In Unix, we can use pipes to direct output of one program to be the standard input of another in the following way:

$ oneapp | secondapp

The idea with our searching tool is to use what we already have and pipe it between the programs to format the output as we'd like:

$ egrep -nR Option src | awk -f script.awk

Notice that we used egrep when in this simple case we didn't need to. It was sufficient to use fgrep or just grep .

Very quick introduction to coding with Awk

Awk is one of the forefathers of languages like Perl and Ruby. In fact some of the ideas I'll show you here exist in them as well.

The structure of awk programs can be summarized as follows:

BEGIN { # init code goes here } # "body" of the script follows: /pattern-1/ { # what to do with the line matching the pattern? } /pattern-n/ { # ... } END { # finalizing }

The interpreter provides default versions for all three parts: a "no-op" for BEGIN and END and "print each line unmodified" for the "body" of the script.

Each line is being exploded into columns based on the "separator" which by default is any number of consecutive white characters. One can change it via the -F switch or by assigning the FS variable inside the BEGIN area. We'll do just that in our example.

The "columns" that lines are being exploded into can be accessed via the special variables:

$0 # the whole line $1 # first column $2 # second column # etc

The FS variable can contain a pattern too. So for example if we'd have a file with the following contents:

The following assignment would make Awk explode lines into proper columns:

BEGIN {

Using Awk to beautify grep searches

Trending Articles

[黑白字幕组]强者的新传说 / Tsuyokute New Saga [02] [webRip] [AVC-8bit 1080P AAC] [繁日内嵌]

太搞笑！西方人念泰语绕口令

帳務小管家 MyMoneyZero 13.8 免安裝中文版 - 中文記帳軟體

阳光电源：部分董事、高级管理人员计划减持不超过32.97万股股份

那是神話，不是悲劇

免费翻墙节点大全

[磁盘工具]AFUWIN(AMI BIOS写入工具) V4.48 中文版

[Tpimage] 2013.11.10 No.521 lily [58P/254.70M] [baidu/360/115]

出售: Rogers RC510 中置 speaker

明慧修炼园地：明慧文章汇编-去除党文化（10）

关门一家亲：习远平、张澜澜、徐才厚

TMS VCL Diagram Studio v4.32.0.0(支持D13) [含附件]

【09.12】[Windows X-Lite] Optimum 11 25H2 Pro 简体中文版

越坂康史days系列：48天島村舞花、58天树花凛、38天范田纱纱、39天杏堂怜，耻辱中的情欲，暂缺68天鳴海小春！

三胞乐语荣膺CCFA最高荣誉“2016中国零售创新大奖”

Photoshop.CS6 (免安裝隨身版隨插即用 ) (直接下載)

語文之戰---成果豐碩

出售: marantz收音擴音機

「Dr.olymer 奧利博士」創辦人許有蕎強佔亞洲健髮生髮高端市場引各界矚目

Worker process manager timed out after 30 seconds while waiting for a...