Description
There is a lot of nice infrastructure to write custom parsers in JuliaSyntax.jl and there is an example of that in e.g. https://github.com/JuliaLang/JuliaSyntax.jl/blob/main/prototypes/simple_parser.jl. The issue is that the tokenizer is hardcoded
JuliaSyntax.jl/src/parse_stream.jl
Lines 345 to 346 in 86bc433
and there is a bit of a mix of generic parsing functionality and julia code specific things like
JuliaSyntax.jl/src/parse_stream.jl
Line 115 in 86bc433
An idea could be to try extract the language agnostic parts of the parser into a separate module/package and make the julia parser an implementation on top of this. Someone who wants to use the infrastructure for a different language could then write their own lexer but still have use for all the other parsing utilities in here.
This would for example require defining an interface for what a custom lexer (and token) should support and e.g. replace hard-coded checks like
JuliaSyntax.jl/src/parse_stream.jl
Lines 1264 to 1266 in 86bc433
with generic versions of this like thombstron(TokenType)
instead of `K"THOMBSTONE" etc.
With the exception of the tokenizer I am not familiar with the code base so I don't know the level of effort and how feasible this is but I thought I would float the idea. The use-case we have is to use JuliaSyntax.jl to parse another language and still have access to e.g. good source code location etc which the data structures in here were designed to provide.