I had a somewhat similar discovery once using GNU parallel. I was trying to generate as much web traffic as possible from a single machine to load test a service I was building, and I assumed that the network I/o would be the bottleneck by a long shot, not the overhead of spawning many processes. I was disappointed by the amount of traffic generated, so I rewrote it in Ruby using the parallel gem with threads (instead of processes), and got orders of magnitude more performance.