Someone recently asked how to take a bunch of numbers from STDIN and then break them down into distribution buckets. This is simple enough that it should be do-able in awk.
Here's a simple script that will generate 100 random numbers. Bucketize them to the nearest multiple of 10, print based on # of items in bucket:
while true ; do echo $[ 1 + $[ RANDOM % 100 ]] ; done | head -100 | awk '{ bucket = int(($1 + 5) / 10) * 10 ; arr[bucket]++} END { for (i in arr) {print i, arr[i] }}' | sort -k2n,2 -k1n,1
Many people don't know that in bash, a single quote can go over multiple lines. This makes it very easy to put a little bit of awk right in the middle of your code, eliminating the need for a second file that contains the awk code itself. Since you can put newlines anywhere, you can make it very readable:
#!/bin/bash
while true ; do
echo $[ 1 + $[ RANDOM % 100 ]]
done | head -100 | \
awk '
{
bucket = int(($1 + 5) / 10) * 10 ;
arr[bucket]++
}
END {
for (i in arr) {
print i, arr[i]
}
}
' | sort -k2n,2 -k1n,1
If you want to sort by the buckets, change the sort to sort -k1n,1 -k2n,2
If you want to be a little more fancy, separate out the bucket function into a separate function. What? awk can do functions? Sure it can. You can also import values from the environment using the -v
flag.
#!/bin/bash
# Bucketize stdin to nearest multiple of argv[1], or 10 if no args given.
# "nearest" means 0..4.999 -> 0, 5..14.999 -> 10, etc.
# Usage:
# while true ; do echo $[ 1 + $[ RANDOM % 100 ]]; done | head -99 | bucket.sh 8
awk -v multiple="${1:-10}" '
function bucketize(a) {
# Round to the nearest multiple of "multiple"
# (nearest... i.e. may round up or down)
return int((a + (multiple/2)) / multiple) * multiple;
}
# All lines get bucketized.
{ arr[bucketize($1)]++ }
# When done, output the array.
END {
for (i in arr) {
print i, arr[i]
}
}
' | sort -k2n,2 -k1n,1
I generally use Python for scripting but for something this short, awk makes sense. Sadly using awk has become a lost art.