Software Development - Programming Languages - RoboProg's

RoboProg's / Software Development

Last Month

Jul 30, 2009

The Once New Thing is Getting Old.

Garbage collection: friend or foe?

Java solves some problems: C and C++ buffer / string overruns and exploits are pretty much a thing of the past for most in-house "Enterprise" applications; Server code binaries pretty much run on most common operating systems.

Java makes some new problems: On the fly scripting development is only recently supported; Java is slow; Java is an incredible memory hog.

Sun (Oracle) is making some inroads into the scripting / incremental development issue with things like Groovy and JRuby. So I guess they got me there: there are some solutions in the works for that.

It would not be fair to compare Java to C or assembler, so I won't. I will, however, compare it to Perl. Perl (5.x) seems to be much faster than Java for things that I have measured side by side in the past. So, it's possible to have an interpreted language that does at least as well as Java. Unfortunately with regard to Perl, it appears "that parrot is dead". Perl 5 gets tweaked, but Perl 6 is lost in an enormous upgrade. I also have my doubts about Perl 6 / Parrot for reasons I will discuss below.

Big, fat, Java? Again, use Perl as a comparison (sorry I don't have numbers handy -- this was test code that I wrote at a previous job). What I remember, albeit a few years and versions back, is that Java was easily twice as slow (or more) than Perl at many string lookup and I/O activities, leading me to favor high volume work in Perl, rather than Java, in some cases (those cases being that it wasn't necesary to break down and "do it in C").

It seems to me that one of the biggest problems in Java, performance wise, is the mark and sweep garbage collector itself. The GC allows you to avoid most memory leaks, but the (high?) cost is that it destroys locality on a virtual memory system. "But I have plenty of RAM on my server!" you say? Not so much: think of your RAM as just a level above the swap partition / file. Think of the "real memory" as being the L1 and L2 caches on your chip(s). So, we are once again back to having about a megabyte, half or double, of fast memory, with slower backing storage elsewhere. I guess I'm still preoccupied with 1985, or at least experiencing the same sort of limitations all over again. That said, all of this activity by the garbage collector, even if it is a more advanced one, such as a generational GC, is frequently pulling in all of this unused / uninteresting secondary storage into main storage, which causes a fair amount of cache misses for the real work when it resumes. I wish I had a way to precisely quantify this, but I don't. Parrot (the Perl 6 VM) is going down the mark and sweep path, as well.

Speaking of locality, another problem in Java (or C# / .Net) is local variables. Everything other than simple numbers and booleans is a reference to an object on the heap. It seems that there is no such thing as a local variable, effectively, in these languages where everthing is a reference to an object. It might be nice to have modest size buffers on the stack, as in the old days of C and Pascal.

So what would I like to do or have?

The short answer is: A hybrid of Perl and Turbo Pascal (sorry, Ruby).

Here are features that I think would be interesting:

  1. Allow a mix of procedural, functional and object oriented programming, using a recursive block syntax as much as possible.
  2. Allow "naked" data structures to be packed into small, contiguous memory spaces. From a caching standpoint, small = fast.
  3. Do run-time checking for buffer / string overrun problems.
  4. Attach an encoding attribute to string variables, so that unsafe values can be more readily rejected by runtime libraries. E.g. - a database library only accepts query strings that have an attribute vouching that the string is SQL safe (as generated by a conversion / escaping routine), or an HTML template library only accepts string that are display-ready (as if filtered though an HTML / javascript injection checker). Do not allow concatenation of unlike string types, unless special hoops are jumped through, which would typically be in special purpose library code.
  5. Support simplified exception handling. Allow the handler to check the exception type, if it matters, but have one clause that catches any errors. I mention this, as I have been bitten several times in Java because I wrote "catch ( Exception e)" rather than "catch ( Throwable t)", and an "error" is not an "exception". Now I know. Why did I ever have to learn such a thing?
  6. Support the running of both compiled / archived binaries, and program text files (compile on the fly), as well as evaluating generated blocks of code.
  7. Use reference counting to partially minimize memory leaks. When a function ends, reclaim as much memory as possible. This means the programmer will still need to write tear down code for data structures with circular references, or simply accept some memory leaks. It also means an end to a garbage collector launching its own "Quest for the Holy Grail" in the background. We can have a garbage collector, but a small, tame one, that does not stray far from home (or the current memory region).
  8. Use C++ style destructors. Java finalizers are useless ("unreliable" begins to describe them). There are things other than memory that need cleaning up when done.
  9. Use fork, rather than threads, for subprocesses. Any memory leak within a subprocess will be cleaned up by exit. It might be interesting to track which process ID allocated an object / data structure, and use an exit handler to clean up, calling destructors for the exiting process only, so that non-memory resource are properly accounted for. I guess running on "Windows" is not a high priority.
  10. Support serialization. Perhaps use something language neutral, like JSON?
  11. Support the use of immutable data structures or objects to be used as messages between concurrent processes. Erlang syntax seems unfamiliar, if not uncomfortable, to me, but steal the good concepts, I guess. Alternatively, maybe I should try to learn something about Scala, but I have already said I am disenchanted with the Java VM.
  12. Support both static typed and weakly typed code. Allow the developer to quickly write dirty code, or specify data types for error checking and optimization.
  13. Interfaces are overkill sometimes. Support function pointers / procedural types / subroutine references / delegates (or whatever your language calls code references).
  14. Support programming by contract. Such optional assertions give you most of the benefits of "XUnit" type testing, without having to maintain a seperate file for the tests.

Maybe carping about the speed of any language is mostly moot now-a-days, since we are all just waiting for the database anyway. Perhaps that is a topic to attack on another day.

Lest I forget to mention aspect oriented programming, I am leaving out aspect oriented programming (as one would mention an unpopular cousin in a will, so as not to have it contested). Maybe I am just too much of a Blub programmer. Or perhaps the aspect oriented Java tools solve Java problems, and it is less of an itch to be scratched in other languages where you could make ad-hoc chains of blocks / routines to be run as the need arose?

And what's so new about mark and sweep garbage collection anyway? Lisp has been doing this for a long time.

Lately, I have been busy with some things at work, which I can't talk about, so I have not written much. Suffice it to say, I still get paid to work with Java in its various facets and frameworks. However, I hope to be writing more soon.

(note that there were no entries for May and June)

Contact me:

Copyright 2009, Robin R Anderson