Text lexer and parser implemented in pure TypeScript with no external dependencies.
This can be used to fast prototype your own programming language compiler/translator frontend, or parse your domain specific language.
yarn add retsac
+*? when defining a grammar rule, just like in Regex.$('name') to avoid accessing them using ugly index like children[0]. You can also rename nodes if you want to query nodes with same type using different names.In this example, we use AdvancedBuilder to define grammar rules using +*?, define top-down traversers using traverser, and query nodes in grammar rules using $.
All conflicts are auto resolved.
const lexer = new Lexer.Builder()
.ignore(Lexer.whitespaces) // ignore blank characters
.define({
string: Lexer.stringLiteral(`"`), // double quote string literal
number: /^-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?/,
})
.define(Lexer.wordType("true", "false", "null")) // type's name is the literal value
.anonymous(Lexer.exact(..."[]{},:")) // single char borders
.build();
export const parser = new ELR.AdvancedBuilder<any>()
.entry("value")
.define(
{ value: `string | number | true | false | null` },
// especially, for string use `eval` to make `\\x` become `\x`
ELR.traverser(({ children }) => eval(children![0].text!))
)
.define(
{ value: `object | array` },
ELR.traverser(({ children }) => children![0].traverse())
)
.define(
{ array: `'[' (value (',' value)*)? ']'` },
ELR.traverser(({ $ }) => $(`value`).map((v) => v.traverse()))
)
.define(
{ object: `'{' (object_item (',' object_item)*)? '}'` },
ELR.traverser(({ $ }) => {
// every object_item's traverse result is an object, we need to merge them
const result: { [key: string]: any } = {};
$(`object_item`).forEach((item) => {
Object.assign(result, item.traverse());
});
return result;
})
)
.define(
{ object_item: `string ':' value` },
// return an object
ELR.traverser(({ $ }) => {
const result: { [key: string]: any } = {};
result[$(`string`)[0].text!.slice(1, -1)] = $(`value`)[0].traverse();
return result;
})
)
.build(lexer, { checkAll: true });
In this example, we use reducer to define bottom-up data reducers, so we can get result when the AST is built.
There are conflicts introduced by those grammar rules, we use high-level resolver APIs priority/leftSA to resolve them.
const lexer = new Lexer.Builder()
.ignore(Lexer.whitespaces) // ignore blank characters
.define({
number: /^[0-9]+(?:\.[0-9]+)?/,
})
.anonymous(Lexer.exact(..."+-*/()"))
.build();
export const parser = new ELR.ParserBuilder<number>()
.entry("exp")
.define(
{ exp: "number" },
ELR.reducer(({ matched }) => Number(matched[0].text))
)
.define(
{ exp: `'-' exp` },
ELR.reducer<number>(({ values }) => -values[1]!)
)
.define(
{ exp: `'(' exp ')'` },
ELR.reducer(({ values }) => values[1])
)
.define(
{ exp: `exp '+' exp` },
ELR.reducer<number>(({ values }) => values[0]! + values[2]!)
)
.define(
{ exp: `exp '-' exp` },
ELR.reducer<number>(({ values }) => values[0]! - values[2]!)
)
.define(
{ exp: `exp '*' exp` },
ELR.reducer<number>(({ values }) => values[0]! * values[2]!)
)
.define(
{ exp: `exp '/' exp` },
ELR.reducer<number>(({ values }) => values[0]! / values[2]!)
)
.priority(
{ exp: `'-' exp` }, // highest priority
[{ exp: `exp '*' exp` }, { exp: `exp '/' exp` }],
[{ exp: `exp '+' exp` }, { exp: `exp '-' exp` }] // lowest priority
)
.leftSA(
// left-self-associative, e.g. 1 - 2 - 3 = (1 - 2) - 3 instead of 1 - (2 - 3)
{ exp: `exp '*' exp` },
{ exp: `exp '/' exp` },
{ exp: `exp '+' exp` },
{ exp: `exp '-' exp` }
)
.build(lexer, { checkAll: true });
This example shows you how to define a simple fn_def grammar rule if you want to build a programming language compiler.
const lexer = new Lexer.Builder()
.ignore(Lexer.whitespaces) // ignore blank chars
.define(Lexer.wordType("pub", "fn", "return", "let")) // keywords
.define({
integer: /^([1-9][0-9]*|0)/,
identifier: /^[a-zA-Z_]\w*/,
})
.anonymous(Lexer.exact(..."+-*/():{};=,")) // single char operator
.build();
export const parser = new ELR.AdvancedBuilder()
.define({
// use `@` to rename a node, this is effective only when using `$` to query nodes
fn_def: `
pub fn identifier@funcName '(' (param (',' param)*)? ')' ':' identifier@retType '{'
stmt*
'}'
`,
})
.define({ param: `identifier ':' identifier` })
.define({ stmt: `assign_stmt | ret_stmt` }, ELR.commit()) // commit to prevent re-lex, optimize performance
.define({ assign_stmt: `let identifier ':' identifier '=' exp ';'` })
.define({ ret_stmt: `return exp ';'` })
.define({ exp: `integer | identifier` })
.define({ exp: `exp '+' exp` })
.entry("fn_def")
.leftSA({ exp: `exp '+' exp` })
.build(lexer, { generateResolvers: "builder", checkAll: true });
All issues and pull requests are highly welcomed.
Generated using TypeDoc