February 22, 2007
The call stack is a communication medium between the calling function and the receiver function. If I say “add (1 2)”, I am sending two parameters to the function ‘add’. Those parameters are copied onto the call stack, then the add function immediately get to work on it. Unfortunately, the caller blocks until the receiver returns a response message (in this case ‘3’). The function calls in a program can be arranged as a very complex call graph (that’s what compiler optimizations construct). The call stack, then, is a temporary trace of one path through that graph. A continuation allows you to freeze that trace, so you can restart it whenever you want. With continuations you can jump around in the call graph. However, the call stack and continuations still use a simple stack as the communication channel for sender and receiver functions. In my mind, it is like a linear series of communicating processes, where process A sends a message to process B, which sends a message to C, etc. Then the response messages flow backwards through response queues: C -> B -> A. I don’t know how to embed pictures, but that would probably help folks understand it.
I propose instead that all communication between functions be handled by a queue. This radically changes the semantics of normal programs. When a function is called, the parameters are put on a queue and the sending function does not block. Instead, the sender gets back a reference to a response queue, though the queue will probably only hold one message (would more than one be useful?). If it tries to fetch the response value, that expression will block until the receiver is finished. Why does this matter? Because if you decouple the sender and receiver with a message queue, then you can run both concurrently. That is, a programming language that tweaks the semantics of function calls can instantly become a massively concurrent language. This sounds like something Haskell should have already done. Note that lazy evaluation is different from concurrent evaluation. A lazy evaluation can be aborted: if no one needs the result, don’t bother doing it. A concurrent evaluation will get done, you just don’t know when. I’d prefer to mix the two in a concurrent language. Again, side-effects ruin everything, but Haskell has a work-around with monads. I’ll have to ask around to see if someone has done this.
November 2, 2006
A while back I hacked up a quick and dirty concurrent queue data structure. There was a bizarre timing bug that resulted in random exceptions. The code looked OK to me, but concurrency bugs are sneaky. That’s why it’s important to use a modeling tool to statically check that your program doesn’t have any sneaky concurrency bugs in there. There are two major tools out there: TLA+ and SPIN. There are some other research tools out there, but these might be the most mature. I tinkered with both a long time ago, but I don’t know enough to make a sound judgment (but I found a review). From a non-technical standpoint, SPIN is appealing because it appears to have a larger user base and more documentation (books & papers). My goal is to embed the specifications into my code using XML comments. Then a simple preprocessor will extract the specs and run them through the model checker. That way I can see the spec while I write my code. So it’s important to choose a spec language that can support this development model. Managing separate spec and code files will become tedious. It would be easier if Spec# and Cω were further along, then Microsoft would do the integration for me.
October 31, 2006
Erlang recently added support for native threads in their runtime. To test performance, someone posted a simple ping-pong benchmark on a mailing list. It spawns N processes, which (1) pings all the other processes, (2) replies to all pings with a pong message, and (3) sends a completion message to a receiver process. I duplicated this demo using the .NET CCR library. The guy who posted the benchmark ran it on a dual-processor machine with 2.4GHz Xeons. Running 700 processes on a single thread took 27 seconds. The same benchmark with 4 Erlang schedulers (i.e., 4 OS threads) took 9.3 seconds. I ran this equivalent benchmark on my pathetic Pentium M (1.6GHz) laptop. Remember, I’ve only got 1 processor. It took 2.6 seconds. The CCR version ran 10X faster than Erlang’s single threaded version on a much faster machine (maybe 50% faster). There are 2*700*700 messages = 980,000 messages (+ 1400 init and complete messages). I’m awfully impressed by these very rough numbers. What other kinds of tests should I run?
Admittedly, Erlang is a far nicer language to program in then .NET & CCR. It is very tedious to construct the dataflow layout with the existing API. I can simplify this greatly with a higher-level API. However, it won’t be as cool as Erlang. Another thing is that Erlang has processes which run as soon as you spawn them. On the CCR (AFAIK), it’s entirely a reactive dataflow model. So you need write your programs differently to map them onto the CCR’s model. But the result is really the same: one could (I think) automatically translate Erlang programs to .NET CCR. So far I’m very excited about the CCR for general concurrency programming.
[edit: I missed a pong reply in my previous post. I updated the numbers in this post.]
October 30, 2006
Buried inside Microsoft’s Robotics Studio is the Concurrency and Coordination Runtime (CCR). This is the runtime layer for the dataflow programming model for their robotics toolkit. Basically, it’s a message-passing runtime that uses their own simplified threadpool design. I’m going to write some tests to compare it with Erlang’s performance, i.e. process creation time and message-passing throughput. Ideally, I’d like to include concurrency specifications in the comments of my C# program, then extract them and run them in a model checker. That way I can validate the concurrency aspect of my programs. I’d also like to write my programs in F#, which would provide stronger type-checking and pattern matching (very useful for message passing). I hope it works, or I’ll be forced to roll my own.
[added] Each process is really a delegate, so process creation time is equal to object creation time. .NET can probably create more objects per second than Erlang’s runtime. I did a very simple ping-pong test (two processes sending messages back and forth). It can send 800K messages per second. This is a very rough test. I’ll duplicate some Erlang benchmarks and report those.
October 25, 2006
I’d like to implement Erlang in .NET as a high-level abstraction above the gritty threading library. The problem is that .NET threads must be larger than 128K, whereas Erlang processes are very small (maybe 300 bytes). Therefore, .NET can create 4000 threads on my little machine, compared to 30,000 Erlang processes. The alternative is to use the Threadpool library to handle all concurrency. Erlang uses message-passing to communicate between processes. In .NET I can implement send and receive as delegates that I put back in the threadpool, effectively implementing a yieldso other processes can run. It’s still cooperative multithreading, which has the standard problem that if a process goes berserk there’s no way to preempt it. As for the queues, these can be implemented to minimize synching and the messages are just pointers to objects, so that’s going to be very fast. If it works, it should be a more convenient API for concurrency than working with threads and monitors directly.