We’ve now come up with the first formal definition of Sapphire in ANTLR 3. There’ll be several (possibly many) more iterations before we are ready to integrate it into the DLR – but that’s no problem since both ANTLR 3.1 and the the DLR itself are still under development (though both are now in beta).
While we’ve referenced Ruby a lot here, I want to make it clear that Sapphire isn’t a fork of Ruby – it’s a completely new language, starting from a clean code base and aimed at the DLR and .NET (and Mono). We don’t have any compatibility baggage or any historical code base to maintain. Sapphire will be a fast, clean, efficient and safe dynamic language. It will be accessible to both those who know Ruby and those from a more traditional C/Java/Delphi background
We’re using two new tools to build Sapphire: ANTLR 3 and Microsoft’s DLR. In fact, the emergence of these tools over the last few months has moved Sapphire from “wouldn’t it be nice if ...” to “It’s now doable. So let’s do it!”. We’ve been kicking the idea of Sapphire around for a couple of years now but without either of these tools, it just would not be possible in a reasonable timeframe to generate a new dynamic language like Sapphire.
Using ANTLR 3 and the DLR, there is a clear set of steps which will result in Sapphire. These are:
Define a base ANTLR3 lexer
Build a C# ‘sub-lexer’ that handles the nasty bits of a Ruby-like syntax. This is actually one of the most difficult bits in the whole business. Without a decent sub-lexer, you’re not going to get very far. We’ve pretty thouroughly debugged the current RiS sub-lexer over the last year or so of selling RiS, so we’re pretty sure it’s solid.
Define an ANTLR 3 parser
Define an ANTLR 3 ‘tree grammar’ that connects the parser to the DLR (this is pure magic in my view! ANTLR 3 fits like a glove onto the DLR here)
Build the DLR ‘generators’ that actually create the IL code that will be executed by the DLR when you run a Sapphire program.
The key points here are the last two - the tree grammar and the generators. These are ‘declarative’ in nature. That is, you write down what you want to happen – walk the AST and emit some code – and the ANTLR tree grammar and the magic of the DLR handle the rest. Now, I’m simplifying quite a lot here – inheritance, encapsulation and scoping are still pretty knotty problems. And I haven’t even touched on closures. But the point is that with ANTLR 3 handling the ‘front end’ and the DLR handling the ‘back end’, the remaining work is manageable. Not easy, I’ll grant that – but eminently doable.
So what have we got so far? Well, the ANTLR 3 lexer and sub-lexer are more-or-less done (I took them from our main Ruby In Steel product and modified them). The Sapphire ANTLR 3 parser exists and we can produce ASTs from it to test our syntax like this:
This comes from a var block in which the names and expected types of variables may be optionally asserted:
var @a, @b :string; @a
@c :int;
@b :int
end
Here’s a flavour of what the ANTLR parser grammar looks like (we’ll publish it in full a bit later on).
callDot
: dotoperation arrayref block? -> ^(CALL[$start, "call_7"] dotoperation arrayref block?)
| dotoperation call_args block? -> ^(CALL[$start, "call_8"] dotoperation call_args block?)
| dotoperation block? -> ^(CALL[$start, "call_9"] dotoperation block?)
;
The point is that this is ‘clean’ – and very much cleaner than Matz’s original yacc syntax. Translating that into ANTLR took me a long time and some very late nights indeed at one point in the original RiS development.
Currently, ANTLR 3 tree grammars are in the next version of ANTLR (3.1 – not yet released). And the DLR is still in beta and I don’t want to commit to anything definitively until the DLR is fully released. So still some way to go, but the foundations are there.
What we’ve removed from Ruby
The first stage of the design has been to determine what we don’t want in Sapphire. Starting from the original Ruby ANTLR 3 specification we’re using in Ruby In Steel, so far we’ve removed:
BEGIN ... END sections (these occur at the start and end of Ruby programs).
=begin and =end block comment delimiters. We’ve replaced them with standard C/Java block comments /* ... */
Modified if, while etc. statements. These are actually easy to implement, but don’t add much to the usefulness of a language. They can also be quite unclear in many circumstances.
No unless or until. It’s clearer to use a negated if or while.
No commands. To invoke a method, you must use (...). Again, this is for clarity. A ‘method’ with no brackets is a field accessor.
Curly brackets are only used for hashes; square brackets are only used for arrays. Currently, the only way to delimit a block is to use do ... end. Again, for clarity (mainly) and safety.
Operator precedence. In Ruby there are about a dozen different levels of operator precedence. This is far too many. I can only remember the ‘usual’ add/multiply precedence. All the rest I use brackets for. So we’ve kept the add/multiply precedence so that 1 + 2 * 3 gives 7 as you would expect and not 9, but all the rest (|, &, <<, etc have the same (lower) precedence.
What we’ve added
So far we’ve added get and set keywords. These work like attr_reader and attr_writer in Ruby and are syntactically similar to Ruby def. Much clearer in my view.
We’ve also added a var declaration section (which is optional). When used, it allows you to assign types to variables. Whether these are enforced by the DLR runtime is a compiler option.
Blocks are first class objects. You can assign a block to a variable and invoke it like a method. Similar to Smalltalk blocks, in fact.
I’ll elaborate on the get and set optional typing system syntax next week.