Sunday, January 13, 2008

Concurrent control structures and Java closures

For background, see Neal Gafter's A Definition of Closures.

A concurrent control structure acts like a control structure (synchronous - closures passed to it do not escape, non-local return makes sense), but it may execute in multiple threads.

This is fundamentally hard to do safely in the Java thread model, but that's a problem for the person writing the control structure, not the user. Except that synchronization must be handled explicitly by the user, so it's not entirely transparent when concurrency is happening.

Non-local control flow (non-local return, break or continue (even goto?)) must be handled "correctly" in a concurrent setting too. If one concurrent thread executes a non-local return (as it should be allowed to, since it is a synchroneous control structure), the control structure should exit gracefully, including stopping its other threads, and allow the non-local return to proceed from the originating thread. If more than one thread tries to do a non-local return, the control structure must ensure that only one of them takes effect.

Its clear that a non-local return from one thread to another will need to end the returning thread abruptly. It also needs to pass this information to its creator, so that the creator can clean up properly, and it will eventually reach the thread that the closure came from, so that the non-local return can be executed.

Currently, when a thread does something non-local that either doesn't exist any more, or not on the current thread, a UnmathedNonlocalTransfer exception is thrown.
This allows noticing a non-local return, but not propagating it to its appropriate thread.

Idea: When a non-local control transfer fails, create a synthetic closure (one that doesn't correspond to a closure literal in the program) that only contains the execution of the non-local return itself. It will have type {=> Unreachable}.
This is passed in the UnmatchedNonlocalTranfer, so that the exception not only reports that an operation failed, but also provides the operation so that it can be retried.
Anybody reacting to the thread's death can get this closure and reactivate the non-local return. Only in the correct thread, the one with the stack frame actually being returned from, will it work and perform the non-local return. In all other threads it will just cause that thread to die too with a new UnmatchedNonlocalTransfer. Caveat Invocator.

Non-local returns - two returns

The BGGA Closure proposal for Java has two types of return.

- One type is the normal return from methods.
- The other is a return from closures.

The destinction is needed because closures is used to implement control structures, and need to act as both expressions and statements in the control structure. Statements passed as a closure to a control structure, are allowed to return from the method enclosing the control structure invocation. The normal return is used for this. Closures returning a value to their invoker instead put an expression at the end of the closure literal and implicitly return this value (like, e.g., Perl syntax).

I think it was Tennent who said something lke: "You should have either zero, one, or an infinite amount of any feature". Not two.

If you have a closure within a closure, there is no simple way to make the inner closure return from the outer one. Closure returns can only appear in one place (at the end). Methods and closures are needlessly different.

I would suggest a more general solution. Now, the big question is, can I come up with one, or is this just unconstructive whining.

Java has solved nested scope problems before, when they introduced nested classes, and from the begining with breaking from nested loops. The solution was always to refer to anything but the closest enclosing instance by name.
You refer to an outer instance as "OuterClass.this". You break from an outer loop as "break outerLoopLabel".

What would the corresponding returns be like?

The simple version is that an unqualfied "return" means return from the innermost closure, and "MethodName.return" returns from the method. That still needs something for nested closures. Let's give the outer closure a name, and write "closureName.return" to return from it. Syntax for naming could be putting a label before a closure literal, i.e.,
closureName:{int x => ... {int y => ... closureName.return 42; ... }...}

However, this fails to properly hide the closure nature in control structures. Part of the effort on control structures is trying to make the code you write inside the closure the same as the code you would write in a language defined control structure (like "for" and "while"). A "return" in there would refer to the enclosing method.

If we let "return" mean return from the (only) enclosing method, we still need to name closures to allow non-local closure return, e.g., as labels above. But that means you will need to name the innermost closure too and return from them using a named return. Not pretty either. It makes it more inconvenient to parameterize control structures with expressions. Ofcourse, we can add the existing local return syntax (expression-at-the-end) as a shorthand for returning from the closest enclosing closure.

All in all, I think I would probably prefer a special syntax for control structures, rather than let them complicate general closures.

Friday, January 4, 2008

Are Closures Overkill?

I'm of course referring to the Java language closure proposals.

It should be easy. I like Java. I like closures. QED! Why wouldn't I like closures in Java? I hadn't given it much thought before I heard Joshua Bloch's speech, and then I had to admit that I agreed with him.

Java is first and foremost a language designed for readability. Not speed of writing, but of reading and comprehending. Any Java programmer should be able to read, and in time understand, any Java program (at least where the algorithms themselves aren't too hairy).

User-defined control structures seems like the most commonly used argument for closures.

So it got me thinking. What's the minimally necessary features to allow people to write their own control structures, like, e.g.:
myCollection.eachWhere(int i: i > 4) { print(i); }

or
    public int firstAboveFour(MyCollection<integer> coll) {
// non-local return
myCollection.foreach(int i) {
if (i > 4) { return i; }
}
}


We need to pass unevaluated program parts to a method. That sounds like a job for call-by-name parameter passing semantics. Of course I wasn't the first to think so. Still, it's a good thought, so I'll hang onto it.

One of the problems with the BGGA closure proposal is the distinction between local and non-local returns. They need to have closures that can do non-local returns (i.e., returns from the method surrounding the closure literal) and local returns (returns a value from the closure itself). Since closures are similar to methods, the most immediate would be to use return to return from the closure.
But as a control structure, it would be most obvious to use return to return from the method containing the control structure. A conundrum.

The "solution" is to have take a page from the book of Perl and allow the body of a closure to contain statements followed by a single expression, and the value of that expression is returned locally. That also allows for a simple syntax for single-expression closures, e.g.: {x=>x*2}. That also means that local returns can only occur at the end. I'm sure they'll have found something prettier soon, if not already, but it's worth stepping back and considering what the problem being solved is.

Local returns are necessary in most uses of closures, and when are closures used in user-defined control structures where we need to abstract over an expression. In the traditional control structure "for(int i = 0; i < n; i++) { ... }" we have one declaration, two expressions and one statement(-block). If we want to create similar control structure, we need to abstract over both expressions and statements. If we use closures for both, closures corresponding to expressions make local returns and closures corresponding to statements makes no local return, but can do non-local ones. Only because we try to use the same feature to model both do we need two different types of returns.

All this only apply to modeling control statements using closures. Other uses of closures might find a use for non-local returns too (like implementing C-like setjump/longjump), but I doubt it will occur much in readable code.

Call-by-name and the Java Language

Now consider using call-by-name parameter passing for both expressions on statements. A method can be declared to take some of its parameters as call-by-name, and also some special parameters are allowed to be unevaluated statements.

This is exactly what we need to build control structures!

When we use build-in control structures, we parameterize them by expressions and
statements, not by closures. The expressions and statements are evaluated during the execution of the control structure.

Closures are a completely different beast, which just happens to be so powerful that you can emulate control structures using them (and just about everything else too, it's that big a cannon).

Another positive thing about call-by-name: The call-by-name arguments are evaluated whenever the variable is referenced, so you can't store the unevaluated expression or statement, nor can you return them. In other words, they are denotable, but neither expressible nor storable. That means that the unevaluated code cannot escape the method call.

That's a problem with closures. They can close over variables (and now also non-local returns), but they can survive longer than the scope of the variable or lifetime of the method to return from. By having special, unevaluated, parameters that cannot survive the call, this problem goes away too. No more UnmatchedNonlocalTransfer exceptions and no need to move local variables to the heap or make them thread safe.

So, since expression have values, we can pass those where a value (locally-)returning closure would be needed, and we can pass statements where a computation is needed, which can include (non-local) returns. But what I need to do some computation to get the value, more than what can be done in a single expression? I.e., where we would use a closure with both statements and final expression? Just do what you would do for any build-in control structure: make a helper method and call that as the expression.

We still need to implement the declaration of the control structure. Since the variables in the above examples are declared and used in the scope where the control structure is used, it doesn't make sense to pass it as call-by-name (what would that mean for a declaration anyway?). Instead we can use another traditional parameter passing method that exactly fits out need: call-by-reference. We declare the variable with a scope that includes the later call-by-name parameters, and then pass it by reference to the control structure implementation, where it can be changed as necessary.

A call-by-reference variable becomes a normal variable that is just an alias for the one that is passed. As a normal variable, all you can do is to read or change its value. Again, you cannot make the aliased variable escape the call any more than any other local variable.

Syntax of the Beast


I'll even suggest a syntax for my constructs above. First for declaring a call-by-name or call-by-reference parameter:
  • Declaration of an expression parameter: {type identifier} (i.e., as wrapping a normal parameter in curly brackets).
  • Declaration of a statement parameter: {identifier}. Just a name, statements have no type.
  • Declaration of reference parameter: ref type identifier. As if stolen from C#. Or perhaps some existing keyword or symbol instead of "ref" to avoid problems.

For calling, we try to stay as close to control structure syntax as possible:
  • Passing an expression: simply write the expression. It is allowed to put semi-colons between expression parameters and their neighboring parameters instead of commas. Perhaps it should even be required, to avoid problems with "comma-expressions".
  • Passing a statement: simply write it as a statement block (i.e., in curly braces). If it is the last parameter, it can be written outside the parentheses instead.
  • Passing a variable by reference: Either ref lvalue or a full declaration (optionally including an initializer). The scope of that declaration is the remainder of the parameters of the same method call. It's allowed to use a colon instead of a comma after the reference parameter, but not required. If it is just before an expression parameter, you can use either colon or semicolon. Perhaps only allow the declaration, if we don't want to introduce call-by-reference generally.

With these "simple" constructions we can write the following method:

class MyCollection<T> {
// ...
public void eachWhere(&T elem,
{boolean test},
{body}) {
T myElement = myFirst();
while(myElement != null) {
elem = myElement;
// maybe consider using test() and body()
// to signify that computation happens
if(test) { body; }
myElement = myNext(myElement);
}
}
}


and call it as
  MyCollection<integer> c = ...; 
c.eachWhere(Integer v: v > 4) { return v; }


Are Closures Overkill?


For creating control structures, closures are not just overkill, they are not even the best match at all. They are powerful hammers, and you can implement pretty much anything with them.

Is call-by-name/call-by-reference the "right amount of kill"? For user-defined control structures, they are precisely what is needed to match the syntax and expressibility of built-in control structures. That should count for something.

It can't do higher order programming (it is at most second order, depending on how you count). It won't help with creating listeners more succinctly (not storable, so you can't add them to anything). But those are different problems.