The results are a bit slower than before because I made some changes to the code after I had done the original benchmarks and I haven’t re-optimised the code paths again. In the original benchmarks, I hand optimised the Cylon code paths eliminating every extra branch and un-needed call in the most commonly used paths. I’ll do that again when we release the multi-threaded version later in the year.
Results
This is on a 2.8GHz Pentium 4 with 1.5GB of memory.
| Pentium |
count |
debug |
nodebug |
overhead |
| factorial |
100000 |
25.109 |
20.938 |
20% |
| linear |
100000000 |
87.031 |
59.859 |
45% |
And this is on a 1GHz Celeron with 1GB of memory.
| Celeron |
count |
debug |
nodebug |
overhead |
| factorial |
100000 |
144.348 |
95.978 |
50% |
| linear |
100000000 |
178.336 |
106.974 |
66% |
I’ve also done the benchmarks on a Core Duo 6600 with 2GB of memory.
| Core Duo |
count |
debug |
nodebug |
overhead |
| factorial |
100000 |
14.844 |
14.297 |
4% |
| linear |
100000000 |
44.203 |
27.344 |
62% |
Interesting.
I’ve done the Core Duo one several times with various values of the count and got the same ratios (yes, I did check that I was actually running the debugger).
I can’t figure out what is going on with the factorial Core Duo or why the linear measure is slower than on the Pentium 4 – but those are the figures I got. I would guess that there might be something odd in the Ruby interpreter that causes this – last time I looked it was compiled using Microsoft C++ v6. Equally, there might be something weird in my code that throwing the benchmarks. When I get a bit more time (when I do the multithreaded version), I’ll recompile Ruby using the latest C++ compiler and see what I get.
Conclusion
It seems to me that my Pentium 4 benchmarks are pretty much in line with what I got previously. They are a bit slower than the first benchmarks, and that’s down to me not optimising things as far as I did initially. It’s certainly not twice as slow.
The Celeron doesn’t show as much of an improvement as the Pentium 4 for some reason. However, it’s a pretty ancient machine - over 6 years old now .
The Core Duo results need more exploration. In one case the Cylon debugger only adds 4% in overhead (which I find difficult to believe, but those are indeed the results) while for the linear test, it adds 60% overhead. It’s still nothing like twice the overhead.
It would seem that there’s more to the Intel Core Duo than meets the eye.
Code
The code I used is here. It’s very simple:
def fac(n)
n == 1 ? 1 : n * fac(n-1)
end
count = 0
tstart = Time.new
#0.upto(100000) {fac(50)}
0.upto(100000000) {count += 1}
tend = Time.new
puts "%10.3f" % tstart
puts "%10.3f" % tend.to_f
diff = tend - tstart
puts "%10.3f" % diff.to_f