Multiline nested comments in Xtext

Sep 6, 2011

When you declare your grammar in xtext, you can specify a few terminals that can appear anywhere in your model file (as opposed to normal rules, which can appear only at places assigned by the grammar). You do this by specifying those terminals as being “hidden”.

For example,

grammar com.wirywolf.Unrealscript hidden(WS, ML_COMMENT, SL_COMMENT)

will allow terminal rules named WS, ML_COMMENT and SL_COMMENT appear anywhere in your model. The objective behind this is to specify blocks of text that the parser doesn’t have to worry about while parsing your DSL model.

Xtext also provides a special -> token that can be used for consuming all text between two tokens. For example, ML_COMMENT is defined as:

terminal ML_COMMENT: '/*' -> '*/';

so everying between a /* and the next */ is ignored by the parser.

Now, Unrealscript files have special blocks of text placed there by the developers at Epic to help their engine find its way around the code (or something like that, they haven’t really explained what those blocks are used for, they’ve just told all UDK users to ignore them). The blocks are placed anywhere in an Unrealscript source file, with the following syntax:

cpptext {
    // bunch of declarations, definitions etc.
}

My first attempt to implement this was as follows:

grammar com.wirywolf.Unrealscript hidden(WS, CPP_TEXT, ML_COMMENT, SL_COMMENT)

terminal CPP_TEXT:
    'cpptext' .* '{' -> '}';

but then I discovered that the code inside the cpptext block might also have braces, in which case my rule would terminate prematurely.

cpptext {
    function SomeFunction() { } <- Rule ends here
} <- Unexpected input '}', expecting EOF

After bumbling around with various regular expressions for a while, I finally came up with a solution that impressed me because it’s the first solution that I should’ve thought of intuitively, and the Xtext developers had already added support for it.

The solution is:

terminal CPP_TEXT:
    'cpptext' -> '{' -> ('{' -> '}')* -> '}';

which can literally be read out loud as “Start with ‘cpptext’, consume everything till the first opening brace. After that, consume everything till you encounter an opening brace. If you encounter an opening brace, consume everything till you encounter a closing brace. Repeat this till you encounter a closing brace”. Mind. blown.