Backend Guide ¶

Contents

Backend Guide
- Agda Backend
- Java Backend
  - CUP
  - ANTLRv4
- Pygments Backend
  - Usage
  - Caveats

Agda Backend ¶

The Agda Backend (option --agda, since 2.8.3) invokes the Haskell backend and adds Agda-bindings for the generated parser, printer, and abstract syntax. The bindings target the GHC backend of Agda, in version 2.5.4 or higher. Example run:

bnfc --agda -m -d Calc.cf
make

The following files are created by the Agda backend, in addition to the files created by the Haskell backend:

AST.agda:

Agda data types bound to the corresponding types for abstract syntax trees in Abs.hs.

Agda bindings for the pretty-printing functions in Print.hs.

Parser.agda: Agda bindings for the parser functions generated by Par.y.

IOLib.agda: Agda bindings for the Haskell IO monad and basic input/output functions.

Main.agda: A test program invoking the parser, akin to Test.hs. Uses do notation, which is supported by Agda >=2.5.4.

The Agda backend targets plain Agda with just the built-in types and functions; no extra libraries required (also not the standard library).

Java Backend ¶

Generates abstract syntax, parser and printer as Java code. Main option: --java.

CUP ¶

By default --java generates input for the CUP parser generator, since 2.8.2 CUP version v11b.

Note

CUP can only generate parsers with a single entry point. If multiple entry points are given using the entrypoint directive, only the first one will be used. Otherwise, the first category defined in the grammar file will be used as the entry point for the grammar. If you need multiple entrypoints, use ANTLRv4.

ANTLRv4 ¶

ANTLRv4 is a parser generator for Java.

With the --antlr option (since 2.8.2) BNFC generates an ANTLRv4 parser and lexer.

All categories can be entrypoints with ANTLR: the entrypoint directive is thus ignored.

Make sure that your system’s Java classpath variable points to an ANTLRv4 jar (download here)

You can use the ANTLR parser generator as follows:

bnfc --java --antlr -m Calc.cf
make

ANTLRv4 returns by default a parse tree, which enables you to make use of the analysis facilities that ANTLR offers. You can of course still get the usual AST built with the abstract syntax classes generated by BNFC.

From the Calc/Test.java, generated as a result of the previous commands:

public Calc.Absyn.Exp parse() throws Exception
{
    /* The default parser is the first-defined entry point. */
    CalcParser.ExpContext pc = p.exp();
    // some code in between
    Calc.Absyn.Exp ast = pc.result;

The pc object is a ParserRuleContext object returned by ANTLR.

It can be used for further analysis through the ANTLR API.

The usual abstract syntax tree returned by BNFC is in the result field of any ParserRuleContext returned by the available parse functions (here exp()).

Pygments Backend ¶

Pygments is not really a compiler front-end tool, like lex and yacc, but a widely used syntax highlighter (used for syntax highlighting on github among others).

With the --pygments option, BNFC generates a new python lexer to be used with pygments.

Usage ¶

There is two ways to add a lexer to pygments:

Fork the pygments codebase and add your lexer in pygments/lexers/
Install your lexer as a pygments plugin using setuptools

In addition to the lexer itself, BNFC will generate an minimal installation script setup.py for the second option so you can start using the highlighter right away without fiddling with pygments code.

Here is an example (assuming you’ve put the Calc grammar in the current directory):

virtualenv myenv    # If you don't use virtualenv, skip the first two steps
source myenv/bin/activate
bnfc --pygments Calc.cf
python setup.py install
echo "1 + 2 - 3 * 4" | pygmentize -l calc

You should see something like:

Here is the LBNF grammar highlighted with the pygments lexer generated from it:

Caveats ¶

The generated lexer has very few highlighting categories. In particular, all keywords are highlighted the same way, all symbols are highlighted the same way and it doesn’t use context (so, for instance, it cannot differentiate the same identifier used as a function definition and a local variable…)

Pygments makes it possible to register file extensions associated with a lexer. BNFC adds the grammar name as a file extension. So if the grammar file is named Calc.cf, the lexer will be associated to the file extension .calc. To associate other file extensions to a generated lexer, you need to modify (or subclass) the lexer.