Poor mans dmenu benchmark

I wanted to know how responsive dmenu and awk, sort, uniq are on a 50MB file (625000 entries of 80 1-byte chars each).

generate file:

  1. #!/bin/bash
  2. echo "Creating dummy file of 50MB in size (625000 entries of 80chars)"
  3. echo "Note: this takes about an hour and a half"
  4. entries_per_iteration=1000
  5. for i in `seq 1 625`
  6. do
  7. echo "Iteration $i of 625 ( $entries_per_iteration each )"
  8. for j in `seq 1 $entries_per_iteration`
  9. do
  10. echo "`date +'%Y-%m-%d %H:%M:%S'` `date +%s`abcdefhijklmno`date +%s | md5sum`" >> ./dummy_history_file
  11. done
  12. done

measure speed:
  1. echo "Plain awk '{print \$3}':"
  2. time awk '{print $3}' dummy_history_file >/dev/null
  3.  
  4. echo "awk + sort"
  5. time awk '{print $3}' dummy_history_file | sort >/dev/null
  6. echo "awk + sort + uniq"
  7. time awk '{print $3}' dummy_history_file | sort | uniq >/dev/null
  8.  
  9. echo "Plain dmenu:"
  10. dmenu < dummy_history_file
  11. echo "awked into dmenu:"
  12. awk '{print $3}' dummy_history_file | dmenu
  13. echo "awk + sort + uniq into dmenu:"
  14. awk '{print $3}' dummy_history_file | sort | uniq | dmenu

Results.
I ran the test twice about an half hour after generating the file, so in the first run, the first awk call may have been affected by a no longer complete Linux block cache.
(I also edited the output format a bit)
Run 1:

Plain awk '{print $3}':
real 0m1.253s
user 0m0.907s
sys 0m0.143s

awk + sort:
real 0m3.696s
user 0m1.887s
sys 0m0.520s

awk + sort + uniq:
real 0m15.768s
user 0m12.233s
sys 0m0.820s

Plain dmenu:
awked into dmenu:
awk + sort + uniq into dmenu:

Run 2

Plain awk '{print $3}':
real 0m1.223s
user 0m0.923s
sys 0m0.107s

awk + sort:
real 0m2.799s
user 0m1.910s
sys 0m0.553s

awk + sort + uniq:
real 0m16.387s
user 0m12.019s
sys 0m0.787s
Plain dmenu:
awked into dmenu:
awk + sort + uniq into dmenu:

Not too bad. It's especially uniq who seems to cause a lot of slowdown. (in this dummy test file, are entries are unique. If there were lots of dupes, the results would probably be different, but I suspect that uniq always needs some time to do its work, dupes or not). The real bottleneck seems to be raw cpu power. Not storage bandwidth at all since Linux caches it. If uncached, I estimate the sequential read would take 1.5 seconds or so. (about 30MB/s on common hard disks)

Once the stuff gets piped into dmenu, there is a little lag but it's reasonably responsive imho.
Test performed on an athlon xp @ 2GHz. 1 GB of memory. There were some other apps running, not a very professional benchmark but you get the idea :)

Trackback URL for this post:

http://dieter.plaetinck.be/trackback/64
Submitted by Dieter_be on Sat, 04/25/2009 - 11:25. categories [ ]

sort -u vs uniq

You might want to check and see if using "sort -u" instead of "sort | uniq" makes it a little faster. At the very least, it will save you time to fork and do IO over pipes.

awk vs grep/cut

Hi, Nice benchmark!
You made me want to test too

Here I'm comparing awk performance against grep and cut.
grep results are astouning!

awk '/1234\.*a/'
real 0m4.385s
user 0m4.184s
sys 0m0.048s

grep -e '1234\.*a'
real 0m0.100s
user 0m0.060s
sys 0m0.020s

Plain awk '{print $3}':
real 0m0.641s
user 0m0.600s
sys 0m0.012s

Plain cut -d ' ' -f 3:
real 0m0.281s
user 0m0.232s
sys 0m0.040s

awk + uniq
real 0m0.979s
user 0m0.792s
sys 0m0.100s

cut + uniq
real 0m0.446s
user 0m0.504s
sys 0m0.060s

Conclusion?

So what's your conclusion? If you want to use awk and do regex matching, do it with grep and pipe the output to awk?
for simple "print nth field" awk being twice as slow as cut was to be expected since awk is much more featurefull. But the regex performance is disappointing indeed.

conclusions

Why do you need to pipe grep output to awk?
awk '//' regexp matching is actually the same as grep

I think that in most cases (at least the ones in uzbl examples) grep and cut can replace awk completely and in all those cases they are faster (that's my conclusion).
sed is also a bit faster in a "s/regexp//" than awk "/regexp/".

I was also wanting to test how does it scale to remove duplicates in my history file before every dmenu launch, and this results have motivated me to start cut|uniq-ing my dmenu.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]". PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Security question, designed to stop automated spam bots