Separating the basic parsing infrastructure from the concrete Julia parser (or BYOT, Bring Your Own Tokenizer)

There is a lot of nice infrastructure to write custom parsers in JuliaSyntax.jl and there is an example of that in e.g. https://github.com/JuliaLang/JuliaSyntax.jl/blob/main/prototypes/simple_parser.jl. The issue is that the tokenizer is hardcoded

https://github.com/JuliaLang/JuliaSyntax.jl/blob/86bc4331eaa08e08bf2af1ba7b50bbbf4af70cdb/src/parse_stream.jl#L345-L346

and there is a bit of a mix of generic parsing functionality and julia code specific things like

https://github.com/JuliaLang/JuliaSyntax.jl/blob/86bc4331eaa08e08bf2af1ba7b50bbbf4af70cdb/src/parse_stream.jl#L115

An idea could be to try extract the language agnostic parts of the parser into a separate module/package and make the julia parser an implementation on top of this. Someone who wants to use the infrastructure for a different language could then write their own lexer but still have use for all the other parsing utilities in here. 

This would for example require defining an interface for what a custom lexer (and token) should support and e.g. replace hard-coded checks like

https://github.com/JuliaLang/JuliaSyntax.jl/blob/86bc4331eaa08e08bf2af1ba7b50bbbf4af70cdb/src/parse_stream.jl#L1264-L1266

with generic versions of this like `thombstron(TokenType)` instead of `K"THOMBSTONE" etc.

With the exception of the tokenizer I am not familiar with the code base so I don't know the level of effort and how feasible this is but I thought I would float the idea. The use-case we have is to use JuliaSyntax.jl to parse another language and still have access to e.g. good source code location etc which the data structures in here were designed to provide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Separating the basic parsing infrastructure from the concrete Julia parser (or BYOT, Bring Your Own Tokenizer) #536

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# Lexer, transforming the input bytes into a token stream
	lexer::Tokenize.Lexer{IOBuffer}

	push!(stream.tokens, SyntaxToken(SyntaxHead(K"TOMBSTONE",EMPTY_FLAGS),
	K"TOMBSTONE", t.preceding_whitespace,
	t.next_byte))

Uh oh!

Separating the basic parsing infrastructure from the concrete Julia parser (or BYOT, Bring Your Own Tokenizer) #536

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions