Awesome Conferences

See us live(rss)   

4 unix commands I abuse every day

A co-worker watched me type the other day and noticed that I use certain Unix commands for purposes other than they are intended. Yes, I abuse Unix commands.



1. grep dot (view the contents of files prefixed by their filename)

I want to view the contents of a few files but I want each line prepended with the the file's name. My solution?

$ grep . *.txt
jack.txt:Once upon a time
jack.txt:there was a fellow named Jack.
lyingryan.txt:Now that "trickle down economics" has been
lyingryan.txt:tested for 30 years and the data shows it
lyingryan.txt:has been a total failure, candidates
lyingryan.txt:still claim that cutting taxes for
lyingryan.txt:billionaires will help the economy.
market.txt:Jack went to market to sell the family
market.txt:cow.
market.txt:He came back with a handful of magic beans.
$

grep is a search tool. Why am I using it like a weird version of cat? Because cat doesn't have an option to prepend the filename to each line of text. And it shouldn't.

Note that "." matches lines with at least 1 character. That is, blank lines are not included. If we change "." (matches any 1 character) to "^" (matches the beginning of a line) then every line will be matched because every line, no matter how short or long, has a beginning! However the period key is easier to type than the caret, at least on my keyboard. Therefore if I don't need the blanks, I don't request them.

Example use: The other day I grabbed the /etc/network/interfaces file from 6 different Linux boxes. I needed to review them all. Each was copied to a filename that was the same as the hostname. "grep . *" let me view them all easily and each line was annotated where it came from.


2. "more star pipe cat" (cat files with a header between each one)

Let's look at another way to accomplish my example of comparing 6 files. In this case I want to print the contents of each file but separate the contents with the file name. Yes, I could do it in a loop:

$ for i in *.txt ; do echo === $i === ; cat $i ; done

However that takes a lot of typing.

This is where I abuse "more". Are you familiar with more? More prints the contents of files but pauses every screenful to ask "More?" Pressing SPACE shows one screenful more. Pressing RETURN shows one line more.

When more was new it was very dumb. It had no search functions, you could skip forward a file but not skip back, it assumed your screen size was 24 lines long. Heaven forbid you weren't on a hardware terminal fixed at 80x24; these new fangled graphic screens with windows that could be resized confused more. Resizing your window while using more made it even more confused. Another problem was that if you piped the output of more to another program things get totally confused because those prompts get sent down the pipe. Certainly the next program in the pipe doesn't expect to see a "More?" every 24 lines.

Luckily someone came along and created a replacement for more that fixed all of those problems. Obviously these features and bugfixes were added to more and we all benefited. No, that's not what happened. Obviously they wrote a new program from scratch and called it "more 2.0" so we could keep typing "more" but have all those new features. No, that's not what happened. In the grand tradition of Unix having a sense of humor this new program was called "less". Thus begat funny conversations like, "Do you use more?" "No, I couldn't live without less." Or the joke: So a pager walks into a bar, and the bartender says "What are you, more or less?"

Some versions of Unix have the old traditional more and less commands. However in many Unix and Unix-like systems both are the same program but the code detects that it was run as more and goes into "more emulation mode". Either way, more gets you the old behavior with a few bugs fixed and less gets you all the cool new stuff.

If you have been using Linux for fewer than 5 years there is a good chance that you didn't know that more existed and quite possibly you were confused why less is called less. Now you know.

Which brings us back to our story. Sometimes people get so used to typing "more" that they type it when they mean "cat". For example they type:

more * | command | command2

when they mean:

cat * | command | command2

For example:

more * | grep -v bar | sort

Old more would send the prompts to grep which would pass them to sort which would get very confused. You'd have to press SPACE a number of times and, since you didn't see any output, you would usually bang on the keyboard in frustration. It's all a big mess.

less is smart enough to detect that its output is going to a pipe and would emulate "cat". This is very smart.

Even smarter is that when less is emulating more instead of producing "the big mess" it acts like cat but outputs little headers for each file:

$ more * | cat 
::::::::::::::
jack.txt
::::::::::::::
Once upon a time
there was a fellow named Jack.
::::::::::::::
lyingryan.txt
::::::::::::::
Now that "trickle down economics" has been
tested for 30 years and the data shows it
has been a total failure, candidates
still claim that cutting taxes for
billionaires will help the economy.
::::::::::::::
market.txt
:::::::::::::::
Jack went to the free-market to sell the family
cow.
He came back with a handful of magic beans.
$

Isn't that pretty?

That works on Linux but not on *BSD. However there's a solution that works on both. We simply take advantage of the fact that if "head" is given more than one file name it prints a little header in front of each file. However we want to see the entire file, not just the first 10 that head normally shows. No worries. We assume the files are shorter than 99999 lines long and do this:

$ head -n 99999 * 
==> jack.txt <==
Once upon a time
there was a fellow named Jack.
==> lyingryan.txt <==
Now that "trickle down economics" has been
tested for 30 years and the data shows it
has been a total failure, candidates
still claim that cutting taxes for
billionaires will help the economy.
==> market.txt <==
Jack went to market to sell the family
cow.
He came back with a handful of magic beans.
$

Note: You can do "head -n 0" on Linux to mean "all lines". However that doesn't work on FreeBSD and other Unixes. (Hey, BSD folks: can you fix that?) You can also use "tail +0" but the header it draws is not as pretty.


3. "grep --color=always '^|foo|bar'

As you get older your eyesight gets worse. It becomes more difficult to find something in a field of text. Here's an eye test. Below is a list of recently run jobs on a Ganeti cluster.

$ gnt-job list
157486 success CLUSTER_VERIFY_CONFIG
157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
157712 success OS_DIAGNOSE
157779 success CLUSTER_VERIFY
157780 success CLUSTER_VERIFY_CONFIG
157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
157994 success OS_DIAGNOSE
158073 running CLUSTER_VERIFY
158074 success CLUSTER_VERIFY_CONFIG
158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
158156 success OS_DIAGNOSE
158367 success CLUSTER_VERIFY
158368 waiting CLUSTER_VERIFY_CONFIG
158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
158432 waiting OS_DIAGNOSE
$

How quickly can you find which is the job that is running? It's kind of burried in there. (The answer is job #158073)

The most interesting jobs are the ones that are running and the ones that are waiting to run. It would be nice to have those highlighted. My first instinct was to simply use grep to remove the successful jobs:

$ gnt-job list | grep -v success
158073 running CLUSTER_VERIFY
158368 waiting CLUSTER_VERIFY_CONFIG
158432 waiting OS_DIAGNOSE
$

However it is useful to see those jobs in context with all the other jobs. What I really want is to have the running and waiting jobs highlighted. Ah! "egrep --color=always" would color the things it finds, right? Ah, but egrep only shows what is found. We get:

$ gnt-job list | egrep --color=always 'running|waiting'
158073 running CLUSTER_VERIFY
158368 waiting CLUSTER_VERIFY_CONFIG
158432 waiting OS_DIAGNOSE
$ 

So how can we output every line but also highlight certain words? Well "." matches everything so we could use that, right? No, it matches every single character. We'd just get 100% red text (try it: egrep . file file2). What else does every line have? It has a beginning! We make a regular expression that matches lines with "a beginning" -or- lines with "running" -or- lines with "waiting". Every line will match and therefore be output. Since "the beginning of each line" has no length, nothing additional will be highlighted in red.

This regular expression matches any line that has a beginning or has the word "running" or has the word "waiting". The matched text will be colored red.

$ gnt-job list | egrep --color=always '^|running|waiting'
157486 success CLUSTER_VERIFY_CONFIG
157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
157712 success OS_DIAGNOSE
157779 success CLUSTER_VERIFY
157780 success CLUSTER_VERIFY_CONFIG
157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
157994 success OS_DIAGNOSE
158073 running CLUSTER_VERIFY
158074 success CLUSTER_VERIFY_CONFIG
158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29)
158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc)
158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
158156 success OS_DIAGNOSE
158367 success CLUSTER_VERIFY
158368 waiting CLUSTER_VERIFY_CONFIG
158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f)
158432 waiting OS_DIAGNOSE
$

Now you can easily see which jobs are running and waiting and still get the full context.

(Note: for some reason this doesn't work on Mac OS and *BSD. However "$" matches the end of a line and works the same way.)

I set up an alias so I can use this command all the time:

alias j="gnt-job list | egrep --color=always '^|running|waiting'"

Note the careful use of ' within ".

If you would like more than just the words "running" and "waiting" highlighted slightly more complex regular expressions are required:

Highlight starting at the world, continuing to the end of the file:

egrep --color=always '^|running.*$|waiting.*$'

Highlight the entire darn line if it has either word in it:

egrep --color=always '^|^.* (running|waiting) .*$'

Of course, if you are typing these commands instead of using them in a script or alias, the least typing to highlight "foo" and "bar" is:

egrep '^|foo|bar'

Chances are "--color=auto" is the default and the right thing will happen. If not, add the "--color=always".

Note: A co-worker just pointed out that "" matches every line and doesn't result in all text being highlighted. He wins for reducing the regex's to be even smaller. Just remove the "^" at the front:

alias j="gnt-job list | egrep --color=always '|running|waiting'"

or

egrep --color=always '|^.* (running|waiting) .*$'

Note2: Someone pointed out that ack will do this with --passthru but ack isn't always on machines I use.


4. "fmt -1" (split lines into individual words)

If you are not familiar with the fmt command, that's probably because you use a modern text editor like vim or emacs which can do the formatting for you. In the old days we had to call an external command to do our formatting. Back then all Unix commands were small, single-function, tools that could be combined to do great things. Now every new Unix command seems to be trying to have more features than MS-Word. But I digress.

"fmt -n" takes text as input and reformats it into nicely shaped paragraphs with no line longer than n. That is, "fmt -65" formats text in nice paragraphs with no line longer than 65 characters.

But what if you have a word that is longer than 65 characters? Does it truncate it? No, then you get a line with just that word on it. (Ok, I lied about "no lines longer than n".)

So how can we abuse this program? Simple! Suppose we have a bunch of text and want to list out the individual words one per line. Well, words that are "too long" are printed on their own line and we want every word to be printed on its own line. Therefore why don't we tell "fmt" that all words are "too long" by saying we want the paragraphs to be formatted to be 1 character long!

$ fmt -1 <fraudulent.txt 
Fraud
is
telling
a
lie
that
benefits
you
and
not
the
person
or
people
you
tell
it
to.
$

Why would you want to do that? There are plenty of situations where this is useful!

Recently I found myself with a long lines of text that mixed usernames and numbers. I wanted to extract out the names. Sure, I could have figured something out with awk or put it into a text editor and copied out the names. Instead I did this:

$ fmt -1 <the_file.txt | egrep -v '^[0-9]'
fred
mary
jane
bob
$

Recently I was curious which IP addresses are mentioned on my wiki:

$ cat *.wiki | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -u
192.168.1.4
192.168.1.7
255.255.255.0#
255.255.255.192
255.255.255.240
8.3.8.1
<code>172.16.240.1
<code>172.16.240.2

Ok, that's not a perfect list but I was able to do that in a few seconds rather than an hour of writing code.

A simple improvement: Transform < and > and a lot of other punctuation into spaces, then delete spaces at the end.

$ cat *.wiki | tr "#:@;()<>=,'-\"" ' ' | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | tr -d ' ' | sort -u
172.16.240.1
172.16.240.2
192.168.1.4
192.168.1.7
255.255.255.0
255.255.255.192
255.255.255.240
8.3.8.1

That's a lot cleaner. 8.3.8.1 is a version number, not an IP address, but this is good enough for a first pass through the list.



I hope you enjoyed my little tour of Unix commands that are useful to mis-use. I'd be interested in hearing what commands you abuse. Please post to the comments!

1 TrackBack

TrackBack URL: http://everythingsysadmin.com/cgi-bin/mt-tb.cgi/1477

My "4 Unix commands I abuse every day" blog post has been published in this month's Hacker News Monthly! Check it out: http://hackermonthly.com/issue-34.html Interestingly enough that post got more hits than any other that I posted last year. It got... Read More

37 Comments | Leave a comment

Offhand, I can't think of any commands I abuse, so instead I'll share a joke that this post made me think of. (As far as I know, it's original to me)

So a pager walks into a bar, and the bartender says "What are you, more or less?"

Regarding your fmt trick, you could also have used grep -o PATTERN to get, let's say, your ip addresses list.

Cheers!

'head -n 99999 * | cat' is the same as 'head -n 99999 *'

Thanks for the fmt command... I remember it in my Unix 101 course but had not used it since then.. Now to replace my alias
awk '{for (i=1;i

Nice trick with fmt, I've not used that command before so it's one I'll have to look at now.

However, having bad eyesight myself, I had discovered the grep highlighting trick before :)

One extra think I like is avoiding the slightly awkward alternation (|) in grep with the "-e" argument, which allows you to provide multiple patterns:
grep -e '' -e foo -e bar

It also means you can use the "-F" flag to treat the patterns as fixed strings, which is useful if "foo" or "bar" are strings that contain regex characters.

I've always liked:

find . -name "*.c" -exec cat {} \; | wc -l

or possibly:

find . -name "*.java" -exec grep text {} /dev/null \; | more

And in vi:

!}fmt

is also useful :-)

Paul.

Paul, for your first command, you can also do:
find . -name '*c' -exec wc {} +
{} + accumulates files and passes them to wc instead of sending each file individually.

For a more generic highlighting command, hl:
function hl() { export R=$1; shift; egrep --color=always "|$R" $@; }

On my system (Ubuntu 12.04), more is a separate command from less. /bin/more is provided by the "util-linux" package, /usr/bin/less by the "less" package. The "less" package doesn't provide a more command. It's the same on CentOS 5.7 (and therefore probably on Red Hat and Fedora).

And here's what I use to highlight selected text. (It assumes VT100-style control sequences, which is a pretty safe assumption these days.)

Here's another trick I use a lot: I write a list of files to a file (e.g., ls *.txt > foo.sh), then edit foo.sh with my favorite editor (which happens to be vim), add #!/bin/sh to the top, and use search-and-replace to convert each file name to a command that operates on that file. I can manually delete or comment out any files I don't want to operate on. Then I just run foo.sh.

It can be a lot easier and safer than constructing a one-line shell command that does the same thing.

A better replacement for the "head" hack is this:

tail +0 file [file...]

It's unusually sparsely known that "tail +0" is very close to "cat" -- but not quite. This works on both Linux and *BSD.

Note "tr ' ' '\n' isn't much longer than "fmt -1" and extends more naturally - you can use "tr ' ,.?!' '\n'" etc. to strip punctuation as well.

Alternately "sed 's/ /\n/g'".

I can't live without grep colorization. I have this in my bash history to ensure I don't have to remember to type more.


export GREP_OPTIONS='--color=auto'
export GREP_COLOR='1;32'

Nice political statement dickwad. Get out of your parent's basement and get a life.

=2= I have a script called mo (a "mo' better more") that simply does this:

exec $* | more

If I type a command starting with "mo" but with a pipeline, it works much like more star pipe cat (on Linux). Very handy.

=3= Amazingly useful, that's going to change my life, seriously!

@Paul Ryan - Why must you use profanity and make anti-family insinuations?

man grep
-H, --with-filename
Print the file name for each match. This is the default when there is more than one file to search.

grep has -o, --only-matching flag. "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line." This will remove your need to tr commands and reduce lenght of last example.

@keithsthompson: you might like vipe from moreutils [1].

You can than run for example `ls *.txt | vipe | command`.

It let's you edit output from one command with your favorite editor and return it to the next command in the pipe.

http://joeyh.name/code/moreutils/

The wiki thingy....

cat *.wiki | fmt -1 | sed 's/^.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*$/\1/g' | sort -nr | uniq -c | sort -n

Much more useful :)

In fact you should be able to solve the IP gathering by using:

$ cat *.wiki | tr "#:@;()>=,'-\"" ' ' | fmt -1 | egrep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | tr -d ' ' | sort -u
which will result in grep only outputting the matching part of the line, which should eliminate most garbage before and after the IPs

Your first command, grep . * is going to skip files with only newlines and skip newlines when displaying the contents of the files, which may be useful, but was not part of the criteria. This command is written better as egrep '\.*' *

I use sed to create command and pipe it to bash like:

ls * | sed -e 's/.*/rm "&"/' | bash

weird but sometimes useful. (in this case it will fail if file contain double quote)

I also sometimes put sed inside sed so first sed create a comment and inner sed is executed after that command.

I have something like this in my .bashrc, but run this only once

#copy with progress bar
function cpv() {
for i in "${@:1:$(($#-1))}"; do
echo $i | sed -e "s/.*/cat \"&\" | pv -s \`du -b \"&\" | cut -f1\` > \"`echo \"${@: -1}\" | sed -e 's/\\//\\\\\//g'`\`basename \"&\"\`\"/" | bash
done
}

For systems that don't have "grep -H" (or if you forget about that option) and want to print filenames during a find -exec grep, I like abusing /dev/null, like so:

find . -name "*.txt" -exec grep mytext {} /dev/null \;

If I now want a script to run in the shell, I just write it in PHP and I run it.

Yes, I understand that this will have a huge performance hit (and it takes more time to do), but the thing is there are many people out there who don't understand shell scripting (it does look like gibberish when you read it) and I like my code to be readable by others.

Just a heads up, that the egrep colorize method behaves strangely on OSX. If you use the begin line character (^), no results will be colorized, however if you use the end line character ($), the results will be colorized.

Example:

ps -A | grep --color=always '^|Applications'

Does not highlight on osx, but:

ps -A | grep --color=always '$|Applications'

Does work correctly. Just a heads up to anyone using OSX

A more correct and concise gathering of ip addresses could be

cat *.wiki | tr -cs '0-9.' ' ' | fmt -1 | grep -E '^ ?([0-9]{1,3}\.){3}[0-9]{1,3} ?$' | tr -d ' ' | sort -u

It uses the -c (complement) and -s (squeeze-repeat) options of tr to get rid of all unwanted characters and it avoids matching strings like "1234.2.3.4" or "1.2.3.2345" or "1.2.3.4." as well.

It may be more performant as well, since most of the unwanted data are already deleted in the first `tr`.

Excellent list. I am old enough that I should have known about fmt... hmmmm ... in any case here is my contribution!

Viewing text files with lines longer than the terminal width is often painful when it isn't clear where one line ends and the next starts. Particularly customised apache log files, logs from many other application, outputs from SQL statements, etc.

sed saves the day most beautifully, and I don't consider this to be abuse!

Example

sed G inputfile

In similar vein cat -vet helps to make sense of difficult to read text files. -v for unprintable characters, -e to append one "$" sign to every line (which reveals lines ending in blanks), and -t to reveal TAB characters.

echo *
Is a shell built-in, and so has got its uses when ls won't run, eg due to missing libraries, which can happen when the system doesn’t boot up fully.

Last one from me: Redirection can come first!

<inputfile grep pattern | cat -n

Makes, IMHO, more sense than the traditional

grep pattern < inputfile | cat -n

I agree that "less" is more than "more" but when it comes to reading man pages I insist on getting the "most" out of my man. On Debian derivatives that would be:

apt-get install most

That should prompt to choosing the pager. If not then:
dpkg-reconfigure most

I just found new command to abuse, using seq to have string_repeat:

seq 4 | xargs -I{} echo -n 'helo_'

People seem to be ignoring Mustafa Simav's comment.

To get a list of IPs from a given input, use grep's -o flag, like:

$ cat *.wiki |egrep -wo '[0-9]{1,3}(\.[0-9]{1,3}){3}' |sort -u

(I also used grep's -w flag, which requires word-breaks on each side of the query, so grep -w 'query' is the same as grep '\bquery\b' and thus this won't match "12345.6.7.8")

Another trick I use daily, perhaps only useful to stats hounds like me, is an alias to rank lines by occurrence counts (or, using fmt -1 or grep -o, words or pattern matches):

alias cnt='sort -f |uniq -ic |sort -fn'

I'm also partial to this alias, which filters out comments to aid in e.g. searching config files, like nocom /etc/config-file |grep query:

alias nocom='sed -e "/^[^#]/!d" -e "s/[\t ]*$//"'

(this used to be merely grep -v "^[^#]" but I got sick of trailing whitespace). Note, BSD sed does not understand \t; use a literal tab. Also note that this doesn't kill in-line comments. For that, add -e "s/#.*//g" in front of the other two -e's. (I used to have that, but it was too aggressive; think about what it would do to this paragraph for example.)

Here's my backup version of a truncation script in case it's not on a system but my .aliases file is:

trunc() { GREP_OPTIONS= EGREP_OPTIONS= egrep -o "^.{0,$COLUMNS}" ${1+"$@"}; }

I'm cherry-picking here; my .aliases file has hundreds of these, compatible with bash and zsh on any *nix system. I'll post it somewhere one of these days.

Always enjoy to see how others (ab)use the command line. For me, I really like awk so even when there's a "simpler" way of doing things I'll still use it anyway. It's all about balancing 'getting the job' done with 'fun'.

see the URL (http://bit.ly/XOuWzz) for critical review. not _all_ criticism, for those who need an update into critical thinking, now banned in TX, but the way of life here in NJ.

My preferred abused command for reformatting text to one word per line is " | xargs printf '%s\n'". The related command " | xargs printf '%s '" reformats text to all words on one line, separated by a single space.

When you use IP addresses in examples to illustrate usage, commands, etc., you should use addresses in 192.0.2.0/24. That's the official "test" network and should be used in documentation.

Leave a comment