Software Development - Buzzard C Library - RoboProg's

RoboProg's / Software Development

Last Month

Dec 29, 2010

What Itch Does a Buzzard Scratch?

I have been very interested in alternative computer languages the last couple years, and exploring concepts beyond the now traditional object oriented Java / .NET duopoly. Don't get me wrong, Java (and its evil twin from Microsoft) both do an adequate job much of the time. Most of what I do to earn a living has been Java for quite a few years, so I feel I have a pretty good idea what it is good at. That said, I think there is a growing mass of people out there, like me, who would like to have something that better supports first class functions, dynamic programming, better memory locality and deterministic behaviour. There are many things out there that do some of most of these things, but finding something flexible and fast seems just a bit out of reach.

Opinions? Everybody's got one.

Note that much of what I have to say about these languages, good or bad, is about the implementations as much as the languages themselves. Assume that I mean the common or most official implementation.

I suppose if I had to pick a language of choice for the last decade or so, it would be Perl. Yes, it can get ugly if you are a sloppy programmer, but it lets you prototype quickly, and for file, string and long term memory buffer manipulation, it runs pretty fast as well. For multithreaded or numerical computation, not so much (score one for the Java / .NET fans). Note that by Perl, I mean Perl 5.x. Perl 6 is another matter, and I will bash on that later.

Ruby is interesting: it works similar to Perl in many ways, but the OOP stuff is syntacticly much cleaner. Unfortunately, it is a bit slow. Some of this might be due to the dynamic method / message dispatch, (anybody remember Smalltalk?) but I am going to blame the garbage collector. OK, I guess I should blame the wild west dynamic message (method) model some, too.

Another interesting fringe language out there is Erlang. Erlang is a functional language. Erlang encourages a style where you make a little task, and send it on its way (on a thread). If it works, you get back one or more messages about the result. If it fails, you get back a message about that as well, so there's your exception handling. Tasks are cheap, and there is no global variable stomp-fest between concurrent operations.

Long ago, I used to like Turbo Pascal. Pascal was much less error prone than C/C++, and just about as fast. Unfortunately, the native string type had a maximum length of 255 (8 bit ASCII) characters, which worked well enough for desktop applications to fill out forms and barf out text lines on a report, but which were useless on a web app (as one might create with the rebranded Delphi product). It's true later versions of TP supported C style asciiz (null terminated) strings, but at that point you had most of the memory management headaches that you would have in a C program. (the astute reader may notice at this point that the style of Pascal and Perl are almost opposite each other, and wonder how I liked both) Oh, and Borland is dead, and I have worked mostly on unix/linux for quite some time anyway. Pascal never really died, by the way: Strong typing, classes, recursive program structures, name spaces, virtual machine to run it on -- it turned into Java!

Finally, there is C. Let's not kid ourselves: C is an ugly language, except for all those other assemblers out there, which are even worse. C is a horrible high level language, but it is an absolutely awesome substitute for assembler(s).

Dishonorable mention: C++. I made the mistake of picking up C++ after I had already learned OOP under Turbo Pascal. There were no standard libraries then, and the syntax was (is) way too error prone. For me, the nail in the coffin was Scott Meyer's "Effective C++" book. Reading between the lines, I realized that the entire book was about the dozens of things that C++ would do to you if you were not very careful. Destructors was about the only thing they got right that Java did not. (and TP had proper destructors as well)

Which brings me to Parrot, also known as Perl 6 (OK, there is a difference between the virtual machine and the language, but again, condsidering that the implementation flavors the language). Parrot has a proper garbage collector, just like the JVM and .NET, rather than the reference counting mechanism that Perl 5 does (And, yes, the GC is a replaceable component in Parrot, but most people are going to use the default). I love having a full GC for coding a rich client. I hate having a GC in a server environment.

So what's the problem?

Even with a generational garbage collector, where short lived transient objects are in a smaller sub-heap that is frequently cleaned up and reused, eventually the GC is going to need to do the full mark and sweep type of operation, unless your program never saves any data. When this happens, your program's working set of recently access memory is going to spill out of the CPU's cache, and into the capacitors scattered in your system's RAM chips (assume many GBs of RAM, with nothing swapped to disk).

You can't write memcached in Java. Well, you could, but there wouldn't be much point, now would there? The point of a utility like memcached is to hold as much data as it can in memory, releasing least recently used data to make room for more recent values to be saved (cached). A moment's consideration will reflect how poorly this will perform in a GC environment, as every piece of data used is found and marked frequently, marching most of the program's data through the CPU cache.

I'll see your memory leak, and raise you a file handle leak. So what about reference counting? It's obsolete. Data structures like circular queues and doubly linked lists get forgotten and their memory lost. OK, so encapsulate them in a wrapper object with a destructor. Then, when the (wrapper) object goes out of scope, the destructor will (deterministicly) be called, and can do the tear down of its own innards. The destructor might even close a file or two, as well (put that in your finalizer and smoke it).

So now what?

So what is this Buzzard thing about? Well, it's not a parrot, it's uglier. It's going to be a high level virtual machine / runtime library (should I ever get to finish it). I am not aiming for purity, but pragmatism. I want to have something with essentially the performance of Perl 5, but support for both functional and object oriented programming, and supporting both dynamic or staticly typed facades. I would also like to use and abuse threads to do some efficient concurrency.

In order to support threads, I believe the best course of action will be to banish global variables from applications. This will of course really piss some people off, much the same way that some cannot stand Python's indentation. And in much the same way, good riddance! If you cannot program without global variables (or without indenting your code consistently), you likely won't be missed.

So how would you communicate between threads? By using message queues, passing serialized data between threads.

Also, I hope to have an environment where components of an application can be broken into "tasks", which have their own compact memory heap, and only pass messages. Tasks would be either synchronous or asynchronous with the code that initiated them, using another thread if asynchronous. By keeping the memory for different aspects / stages of an application more contiguous (and hopefully relatively small), I hope to exploit locality and better use the CPU caches, rather than frequently spilling out to (slow) RAM, or (the horror!) swap space.

Finally, I expect to have a library that can be understood from the bottom up, with an expectation that there will be multiple "language" front ends (or C application application bindings, even), from the beginning, rather than one true language and other second class citizens. That's probably crap, but it reflects a degree of experimentatalism for now.

Well, that's the hope, anyway. I have precious little actually done. I would also be open to suggestions about anything that does most of these.

(yeah, it's been a while, again)

Contact me:

Copyright 2010, Robin R Anderson