Skip to content

Separating the basic parsing infrastructure from the concrete Julia parser (or BYOT, Bring Your Own Tokenizer) #536

Open
@KristofferC

Description

@KristofferC

There is a lot of nice infrastructure to write custom parsers in JuliaSyntax.jl and there is an example of that in e.g. https://github.com/JuliaLang/JuliaSyntax.jl/blob/main/prototypes/simple_parser.jl. The issue is that the tokenizer is hardcoded

# Lexer, transforming the input bytes into a token stream
lexer::Tokenize.Lexer{IOBuffer}

and there is a bit of a mix of generic parsing functionality and julia code specific things like

flags::RawFlags

An idea could be to try extract the language agnostic parts of the parser into a separate module/package and make the julia parser an implementation on top of this. Someone who wants to use the infrastructure for a different language could then write their own lexer but still have use for all the other parsing utilities in here.

This would for example require defining an interface for what a custom lexer (and token) should support and e.g. replace hard-coded checks like

push!(stream.tokens, SyntaxToken(SyntaxHead(K"TOMBSTONE",EMPTY_FLAGS),
K"TOMBSTONE", t.preceding_whitespace,
t.next_byte))

with generic versions of this like thombstron(TokenType) instead of `K"THOMBSTONE" etc.

With the exception of the tokenizer I am not familiar with the code base so I don't know the level of effort and how feasible this is but I thought I would float the idea. The use-case we have is to use JuliaSyntax.jl to parse another language and still have access to e.g. good source code location etc which the data structures in here were designed to provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions