retsac

Retsac

coverage build license

Text lexer and parser implemented in pure TypeScript with no external dependencies.

This can be used to fast prototype your own programming language compiler/translator frontend, or parse your domain specific language.

Installation

yarn add retsac

Features

The Lexer, turns a text string to a token list.
- Regex support. See examples below.
- Built-in util functions makes it super easy to process the input.
- Support custom error handling functions to prevent interruptions during the process.
- Support custom functions to yield tokens from the input string.
The Parser, co-work with the lexer and produce an AST (Abstract Syntax Tree).
- By default retsac provides an ELR(Expectational LR) parser.
  - Support meta characters like +*? when defining a grammar rule, just like in Regex.
  - Support conflict detection (for reduce-shift conflicts and reduce-reduce conflicts), try to auto resolve conflicts by peeking the rest of input, you can also set grammar rule's priority or self-associativity to auto generate resolvers. Besides, we provide a code generator as the low-level API to manually resolve conflict, .
  - Query children nodes by using $('name') to avoid accessing them using ugly index like children[0]. You can also rename nodes if you want to query nodes with same type using different names.
  - Optional data reducer to make it possible to get a result value when the parse is done.
  - Optional traverser to make it easy to invoke a top-down traverse after the AST is build.
  - Expect lexer to yield specific token type and/or content to parse the input more smartly.
  - Try to re-lex the input if parsing failed. You can rollback global state when re-lex, or commit existing changes to prevent re-lex.
- The AST can be serialized to a JSON object to co-work with other tools (e.g. compiler backend libs).
Provide multi-level APIs to make this easy to use and highly customizable.

Resources

Examples

JSON Parser

In this example, we use AdvancedBuilder to define grammar rules using +*?, define top-down traversers using traverser, and query nodes in grammar rules using $.

All conflicts are auto resolved.

Click to Expand

const lexer = new Lexer.Builder()
  .ignore(Lexer.whitespaces) // ignore blank characters
  .define({
    string: Lexer.stringLiteral(`"`), // double quote string literal
    number: /^-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?/,
  })
  .define(Lexer.wordType("true", "false", "null")) // type's name is the literal value
  .anonymous(Lexer.exact(..."[]{},:")) // single char borders
  .build();

export const parser = new ELR.AdvancedBuilder<any>()
  .entry("value")
  .define(
    { value: `string | number | true | false | null` },
    // especially, for string use `eval` to make `\\x` become `\x`
    ELR.traverser(({ children }) => eval(children![0].text!))
  )
  .define(
    { value: `object | array` },
    ELR.traverser(({ children }) => children![0].traverse())
  )
  .define(
    { array: `'[' (value (',' value)*)? ']'` },
    ELR.traverser(({ $ }) => $(`value`).map((v) => v.traverse()))
  )
  .define(
    { object: `'{' (object_item (',' object_item)*)? '}'` },
    ELR.traverser(({ $ }) => {
      // every object_item's traverse result is an object, we need to merge them
      const result: { [key: string]: any } = {};
      $(`object_item`).forEach((item) => {
        Object.assign(result, item.traverse());
      });
      return result;
    })
  )
  .define(
    { object_item: `string ':' value` },
    // return an object
    ELR.traverser(({ $ }) => {
      const result: { [key: string]: any } = {};
      result[$(`string`)[0].text!.slice(1, -1)] = $(`value`)[0].traverse();
      return result;
    })
  )
  .build(lexer, { checkAll: true });

Calculator

In this example, we use reducer to define bottom-up data reducers, so we can get result when the AST is built.

There are conflicts introduced by those grammar rules, we use high-level resolver APIs priority/leftSA to resolve them.

Click to Expand

const lexer = new Lexer.Builder()
  .ignore(Lexer.whitespaces) // ignore blank characters
  .define({
    number: /^[0-9]+(?:\.[0-9]+)?/,
  })
  .anonymous(Lexer.exact(..."+-*/()"))
  .build();

export const parser = new ELR.ParserBuilder<number>()
  .entry("exp")
  .define(
    { exp: "number" },
    ELR.reducer(({ matched }) => Number(matched[0].text))
  )
  .define(
    { exp: `'-' exp` },
    ELR.reducer<number>(({ values }) => -values[1]!)
  )
  .define(
    { exp: `'(' exp ')'` },
    ELR.reducer(({ values }) => values[1])
  )
  .define(
    { exp: `exp '+' exp` },
    ELR.reducer<number>(({ values }) => values[0]! + values[2]!)
  )
  .define(
    { exp: `exp '-' exp` },
    ELR.reducer<number>(({ values }) => values[0]! - values[2]!)
  )
  .define(
    { exp: `exp '*' exp` },
    ELR.reducer<number>(({ values }) => values[0]! * values[2]!)
  )
  .define(
    { exp: `exp '/' exp` },
    ELR.reducer<number>(({ values }) => values[0]! / values[2]!)
  )
  .priority(
    { exp: `'-' exp` }, // highest priority
    [{ exp: `exp '*' exp` }, { exp: `exp '/' exp` }],
    [{ exp: `exp '+' exp` }, { exp: `exp '-' exp` }] // lowest priority
  )
  .leftSA(
    // left-self-associative, e.g. 1 - 2 - 3 = (1 - 2) - 3 instead of 1 - (2 - 3)
    { exp: `exp '*' exp` },
    { exp: `exp '/' exp` },
    { exp: `exp '+' exp` },
    { exp: `exp '-' exp` }
  )
  .build(lexer, { checkAll: true });

Function Definition

This example shows you how to define a simple fn_def grammar rule if you want to build a programming language compiler.

Click to Expand

const lexer = new Lexer.Builder()
  .ignore(Lexer.whitespaces) // ignore blank chars
  .define(Lexer.wordType("pub", "fn", "return", "let")) // keywords
  .define({
    integer: /^([1-9][0-9]*|0)/,
    identifier: /^[a-zA-Z_]\w*/,
  })
  .anonymous(Lexer.exact(..."+-*/():{};=,")) // single char operator
  .build();

export const parser = new ELR.AdvancedBuilder()
  .define({
    // use `@` to rename a node, this is effective only when using `$` to query nodes
    fn_def: `
      pub fn identifier@funcName '(' (param (',' param)*)? ')' ':' identifier@retType '{'
        stmt*
      '}'
    `,
  })
  .define({ param: `identifier ':' identifier` })
  .define({ stmt: `assign_stmt | ret_stmt` }, ELR.commit()) // commit to prevent re-lex, optimize performance
  .define({ assign_stmt: `let identifier ':' identifier '=' exp ';'` })
  .define({ ret_stmt: `return exp ';'` })
  .define({ exp: `integer | identifier` })
  .define({ exp: `exp '+' exp` })
  .entry("fn_def")
  .leftSA({ exp: `exp '+' exp` })
  .build(lexer, { generateResolvers: "builder", checkAll: true });

Contribute

All issues and pull requests are highly welcomed.

retsac

Retsac

Installation

Features

Resources

Examples

JSON Parser

Calculator

Function Definition

Contribute

CHANGELOG

Settings

Member Visibility

Theme