I'm of course referring to the Java language closure proposals.
It should be easy. I like Java. I like closures. QED! Why wouldn't I like closures in Java? I hadn't given it much thought before I heard Joshua Bloch's
speech, and then I had to admit that I agreed with him.
Java is first and foremost a language designed for readability. Not speed of writing, but of reading and comprehending. Any Java programmer should be able to read, and in time understand, any Java program (at least where the algorithms themselves aren't too hairy).
User-defined control structures seems like the most commonly used argument for closures.
So it got me thinking. What's the minimally necessary features to allow people to write their own control structures, like, e.g.:
myCollection.eachWhere(int i: i > 4) { print(i); }
or
public int firstAboveFour(MyCollection<integer> coll) {
// non-local return
myCollection.foreach(int i) {
if (i > 4) { return i; }
}
}
We need to pass unevaluated program parts to a method. That sounds like a job for call-by-name parameter passing semantics. Of course I
wasn't the first to think so. Still, it's a good thought, so I'll hang onto it.
One of the problems with the BGGA closure proposal is the distinction between local and non-local returns. They need to have closures that can do non-local returns (i.e., returns from the method surrounding the closure literal) and local returns (returns a value from the closure itself). Since closures are similar to methods, the most immediate would be to use
return to return from the closure.
But as a control structure, it would be most obvious to use
return to return from the method containing the control structure. A conundrum.
The "solution" is to have take a page from the book of Perl and allow the body of a closure to contain statements followed by a single expression, and the value of that expression is returned locally. That also allows for a simple syntax for single-expression closures, e.g.:
{x=>x*2}. That also means that local returns can only occur at the end. I'm sure they'll have found something prettier soon, if not already, but it's worth stepping back and considering what the problem being solved is.
Local returns are necessary in most uses of closures, and when are closures used in user-defined control structures where we need to abstract over an
expression. In the traditional control structure "
for(int i = 0; i < n; i++) { ... }" we have one declaration, two expressions and one statement(-block). If we want to create similar control structure, we need to abstract over both expressions and statements. If we use closures for both, closures corresponding to expressions make local returns and closures corresponding to statements makes no local return, but can do non-local ones. Only because we try to use the same feature to model both do we need two different types of returns.
All this only apply to modeling control statements using closures. Other uses of closures might find a use for non-local returns too (like implementing C-like setjump/longjump), but I doubt it will occur much in
readable code.
Call-by-name and the Java Language
Now consider using call-by-name parameter passing for both expressions on statements. A method can be declared to take some of its parameters as call-by-name, and also some special parameters are allowed to be unevaluated statements.
This is exactly what we need to build control structures!
When we use build-in control structures, we parameterize them by
expressions and
statements, not by closures. The expressions and statements are evaluated during the execution of the control structure.
Closures are a completely different beast, which just happens to be so powerful that you can emulate control structures using them (and just about everything else too, it's that big a cannon).
Another positive thing about call-by-name: The call-by-name arguments are evaluated whenever the variable is referenced, so you can't
store the unevaluated expression or statement, nor can you return them. In other words, they are denotable, but neither expressible nor storable. That means that the unevaluated code cannot escape the method call.
That's a problem with closures. They can close over variables (and now also non-local returns), but they can survive longer than the scope of the variable or lifetime of the method to return from. By having special, unevaluated, parameters that cannot survive the call, this problem goes away too. No more
UnmatchedNonlocalTransfer exceptions and no need to move local variables to the heap or make them thread safe.
So, since expression have values, we can pass those where a value (locally-)returning closure would be needed, and we can pass statements where a computation is needed, which can include (non-local) returns. But what I need to do some computation to get the value, more than what can be done in a single expression? I.e., where we would use a closure with both statements and final expression? Just do what you would do for any build-in control structure: make a helper method and call that as the expression.
We still need to implement the
declaration of the control structure. Since the variables in the above examples are declared and used in the scope where the control structure is used, it doesn't make sense to pass it as call-by-name (what would that mean for a declaration anyway?). Instead we can use another traditional parameter passing method that exactly fits out need: call-by-reference. We declare the variable with a scope that includes the later call-by-name parameters, and then pass it by reference to the control structure implementation, where it can be changed as necessary.
A call-by-reference variable becomes a normal variable that is just an alias for the one that is passed. As a normal variable, all you can do is to read or change its value. Again, you cannot make the aliased variable escape the call any more than any other local variable.
Syntax of the Beast
I'll even suggest a syntax for my constructs above. First for declaring a call-by-name or call-by-reference parameter:
- Declaration of an expression parameter:
{type identifier} (i.e., as wrapping a normal parameter in curly brackets). - Declaration of a statement parameter:
{identifier}. Just a name, statements have no type. - Declaration of reference parameter:
ref type identifier. As if stolen from C#. Or perhaps some existing keyword or symbol instead of "ref" to avoid problems.
For calling, we try to stay as close to control structure syntax as possible:
- Passing an expression: simply write the expression. It is allowed to put semi-colons between expression parameters and their neighboring parameters instead of commas. Perhaps it should even be required, to avoid problems with "comma-expressions".
- Passing a statement: simply write it as a statement block (i.e., in curly braces). If it is the last parameter, it can be written outside the parentheses instead.
- Passing a variable by reference: Either
ref lvalue or a full declaration (optionally including an initializer). The scope of that declaration is the remainder of the parameters of the same method call. It's allowed to use a colon instead of a comma after the reference parameter, but not required. If it is just before an expression parameter, you can use either colon or semicolon. Perhaps only allow the declaration, if we don't want to introduce call-by-reference generally.
With these "simple" constructions we can write the following method:
class MyCollection<T> {
// ...
public void eachWhere(&T elem,
{boolean test},
{body}) {
T myElement = myFirst();
while(myElement != null) {
elem = myElement;
// maybe consider using test() and body()
// to signify that computation happens
if(test) { body; }
myElement = myNext(myElement);
}
}
}
and call it as
MyCollection<integer> c = ...;
c.eachWhere(Integer v: v > 4) { return v; }
Are Closures Overkill?
For creating control structures, closures are not just overkill, they are not even the best match at all. They are powerful hammers, and you can implement pretty much anything with them.
Is call-by-name/call-by-reference the "right amount of kill"? For user-defined control structures, they are precisely what is needed to match the syntax and expressibility of built-in control structures. That should count for something.
It can't do higher order programming (it is at most second order, depending on how you count). It won't help with creating listeners more succinctly (not storable, so you can't add them to anything). But those are different problems.