Beluga_parser.Parser_combinatorGeneral purpose parser combinator library.
This is a hand-made parser combinator system, inspired by Haskell's parsec library. The basic idea in parser combinators is to use higher-order functions to manipulate parsing functions to build up more complex parsers. The most basic type we can give for a parsing function would be something like:
type 'a parser = char list -> char list * 'a optionchar list represents the input stream. We must return a new stream, perhaps with some characters removed since we parsed them. The result of the parse is parametric. Since parsing may fail, we use a type constructor to represent this, namely option.
However this design has many shortcomings.
char list requires the entire input to be buffered in memory. This is wasteful. Instead, we should progressively read input as we need it. Therefore in our implementation we use a lazily-evaluated persistent sequence.option to represent failure doesn't give us a means to specify what the error is. Instead we use ('a, exn) result in order to also return some information in case of failure.So our parsing function type now looks like:
type 'a parser = state -> state * ('a, exn) resultwhere state contains a Located_token.t Seq.t for the input as well as extra stuff for handling backtracking. This is an instance of the state monad as defined in Support.State.STATE.
The naive implementation of the alt alternation combinator is to say that alt p1 p2 first runs p1, and if it fails, runs p2. This implementation has the major drawback of allowing unlimited backtracking. This is undesirable because it results in terrible error messages. What we would like is a way to control the backtracking behaviour of parsers on a more fine-grained level.
Instead, the library below is non-backtracking, so it introduces the trying combinator to selectively enable backtracking. trying p runs p, and if p fails having even consumed input, then trying p can still be backtracked out of. The alt combinator is implemented like this:
p1. If it succeeds, return its result without trying p2.p1 failed without consuming any input, then run p2.p1 failed under a trying, then run p2.p1 was fatal, so return it.The rationale for this is that it allows the parser writer to commit to a parse tree once certain conditions have been met. For example, in Beluga, after a case keyword, we know for sure that we're parsing a case expression. Therefore, if we fail afterwards to parse the scrutinee of the case (a synthesizable expression), we should not backtrack out of the parser for case expressions.
This parser combinator library is only suited for top-down data-independent parsing. That is, tokens are parsed sequentially from left to right, and no auxiliary data is accumulated during parsing. This means that left-recursive grammars and data-dependent grammars are not supported without either rewriting the grammar, or disambiguating and rewriting parsed ASTs at a later stage.
It is best to implement a parser with a pre-defined context-free grammar in a language specification document. This grammar may need rewriting to eliminate left recursion, assign precedences and associativities using recursive descent parsing.
module type PARSER_STATE = sig ... endAbstract parser state definition.
module type LOCATED = sig ... endModule type definition for elements having location annotations.
module Make_persistent_state (Token : LOCATED) : sig ... endFunctor creating an instance of PARSER_STATE backed by an immutable data structure.
module type PARSER = sig ... endmodule Make
(State : PARSER_STATE with type location = Beluga_syntax.Location.t) :
PARSER
with type state = State.state
and type token = State.token
and type location = State.location