I've been considering the "c10k" problem lately: to have a server that must service 10,000 concurrent requests. I don't think I've solved it. However, I wanted to get a feel for how threads and the unix fork construct compare, performance wise.
Below is a table of several possible languages / implementations. I first used Perl for my benchmarks, because it is supposed to have native threads. It turns out, the default build on Ubuntu 8.04 is using green (simulated) threads. I also used Python (which I don't really know) because it is an interpreted high level language which supports both fork and native threads.
Language | Fork | Native Threads | Green Threads | Comments |
---|---|---|---|---|
Java | N | Y | N | Green threads are no longer used |
Perl | Y | N | Y | Threading model is implementation dependent |
Ruby | Y | N | Y | |
Python | Y | Y | N | Note that the so-called GIL can degrade to effectively only utilizing a single CPU |
C | Y | Y | N | Not considering third party libraries to do green threads |
What the test programs do is to simulate overlapping requests and responses. While there is no actual socket connection to a client, each sub-process must generate a formatted timestamp, followed by about a half KB of pseudo-XML. There is a mutex around the time formatter in the threaded version in either implementation, as many resources in multithreaded solutions have to be guarded via access in non-reentrant critical sections. Here is the Perl source and the Python source.
My first set of trials is on a 3 GHz (1025 KB cache) Intel Pentium D, which has 2 logical CPUs and SMP enabled in the OS.
|
My second set of trials is on a 1.5 GHz (512 KB cache) Intel Atom, which has hyperthreading disabled and is thus running as a single CPU.
|
(note that I ran the trials multiple times, and the results were consistant with the samples shown here)
I have summarized the results below
Host | Implementation | Method | Elapsed Time (sec) |
Relative Speed (Perl + Fork = 1, 0 = instant) |
---|---|---|---|---|
Pentium D | Perl | Fork | 19.4 | 1 |
Pentium D | Perl | Thread | 20.1 | 1.04 |
Pentium D | Python | Fork | 30.1 | 1.55 |
Pentium D | Python | Thread | 20.0 | 1.03 |
Atom | Perl | Fork | 22.4 | 1 |
Atom | Perl | Thread | 29.7 | 1.33 |
Atom | Python | Fork | 35.3 | 1.56 |
Atom | Python | Thread | 1.83 | 0.08 |
These were not exactly the results I expected: the Python thread implementation on the wimpy machine runs much faster than any other implementation on either host. I guess that's why they call it computer science, since sometimes you have to measure instead of believing dogma.
I will, however, speculate about the other results. No matter how you slice it, I think Python must have a pretty good thread implementation, relative to Perl. Now, on the dual core SMP host, the Perl fork implementation seemed to edge out the other implementations. I think this may be due, in part, to the garbage collection implementations: Perl uses a reference counting scheme, which while much disparaged in the press as inadequate, does keep memory references more localized during execution; Python uses the now industry standard mark and sweep garbage collection, which is more thorough, but causes several performance problems -- All of that sweeping causes much of the heap to be marched in and out of the CPU cache, destroying an locality benefit the code might have had, and, all of that marking causes any forked processes to have to copy memory pages that had no real application data changes in them. I suspect that Java shares performance characteristics with the Python thread implementation. Like they say, the Lisp programmer knows the value of everything, and the cost of nothing (in contrast to the cynic). What do you think?
Contact me: |
r |
o |
b |
o |
p |
r |
o |
g |
@ |
yahoo.com |