The results are a bit slower than before because I made some changes to the code after I had done the original benchmarks and I haven’t re-optimised the code paths again. In the original benchmarks, I hand optimised the Cylon code paths eliminating every extra branch and un-needed call in the most commonly used paths. I’ll do that again when we release the multi-threaded version later in the year.
Results
This is on a 2.8GHz Pentium 4 with 1.5GB of memory.
| Pentium | count | debug | nodebug | overhead |
| factorial | 100000 | 25.109 | 20.938 | 20% |
| linear | 100000000 | 87.031 | 59.859 | 45% |
And this is on a 1GHz Celeron with 1GB of memory.
| Celeron | count | debug | nodebug | overhead |
| factorial | 100000 | 144.348 | 95.978 | 50% |
| linear | 100000000 | 178.336 | 106.974 | 66% |
I’ve also done the benchmarks on a Core Duo 6600 with 2GB of memory.
| Core Duo | count | debug | nodebug | overhead |
| factorial | 100000 | 14.844 | 14.297 | 4% |
| linear | 100000000 | 44.203 | 27.344 | 62% |
Interesting.
I’ve done the Core Duo one several times with various values of the count and got the same ratios (yes, I did check that I was actually running the debugger).
I can’t figure out what is going on with the factorial Core Duo or why the linear measure is slower than on the Pentium 4 – but those are the figures I got. I would guess that there might be something odd in the Ruby interpreter that causes this – last time I looked it was compiled using Microsoft C++ v6. Equally, there might be something weird in my code that throwing the benchmarks. When I get a bit more time (when I do the multithreaded version), I’ll recompile Ruby using the latest C++ compiler and see what I get.
Conclusion
It seems to me that my Pentium 4 benchmarks are pretty much in line with what I got previously. They are a bit slower than the first benchmarks, and that’s down to me not optimising things as far as I did initially. It’s certainly not twice as slow.
The Celeron doesn’t show as much of an improvement as the Pentium 4 for some reason. However, it’s a pretty ancient machine - over 6 years old now .
The Core Duo results need more exploration. In one case the Cylon debugger only adds 4% in overhead (which I find difficult to believe, but those are indeed the results) while for the linear test, it adds 60% overhead. It’s still nothing like twice the overhead.
It would seem that there’s more to the Intel Core Duo than meets the eye.
Code
The code I used is here. It’s very simple:
def fac(n)
n == 1 ? 1 : n * fac(n-1)
end
count = 0
tstart = Time.new
#0.upto(100000) {fac(50)}
0.upto(100000000) {count += 1}
tend = Time.new
puts "%10.3f" % tstart
puts "%10.3f" % tend.to_f
diff = tend - tstart
puts "%10.3f" % diff.to_f