Clarified imports/includes #8
Replies: 4 comments 4 replies
-
In general I think that rules for combining grammars are very useful and improvements in this area would be very valuable. For example, we would like to write different grammars for ANSI SQL, and different SQL dialects and have some form of inheritance among them, however it seems hard to do with ANTLR 4. I think that importing/combining grammars would be a very advanced feature, of interest for advanced users, so I would err on the side of making it more powerful even at the cost of being a little harder to learn. Reading this, one consideration that comes to mind is: would things be simpler if we did not have combined grammars? For lexers, the only thing that comes to mind is that, if I understand correctly, we are considering the to inject all the imported lexer rules, all consecutively, in a certain point of the importing lexer. I think that would be appropriate for lexer rules which were designed to be imported, but it could not always work for lexer rules that were intended to be used on their own and they are now reused. Example of grammar conceived to be reused: a grammar defining common mathematical symbols In the latter case I may want to intertwine the imported rules with the rules in the importing grammars. To support that we could perhaps allow the user to define blocks of rules or sections of rules and import them as a block. Something like:
One could then specify in the importing grammar where to import a block of lexer rules. This could be optional, so that it is relevant only for advanced users and the others can safely ignore it. Also, what about lexer modes? |
Beta Was this translation helpful? Give feedback.
-
Things would definitely be simpler for antlr developers if we didn't support combined grammars, but not for beginners... My proposal is as follows:
A parser grammar could start like this:
whereas a lexer grammar would not allow the above, but would support the following:
The above indeed doesn't cater for reuse of grammars not designed for reuse. |
Beta Was this translation helpful? Give feedback.
-
These rules seem fine, but I'm not an expert on I'm writing some "lint"-type scripts for these rules and will check it against grammars-v4. Already I see that https://github.com/antlr/grammars-v4/tree/master/oncrpc violates your first rule ("don't support import/include in combined grammars"), but the grammar does seem to "work." (The grammar is tested with only one test input, and checks only 62% of the parser rules in the grammar.) For Antlr5, the grammar would have to be split. Split grammars are required to write "target agnostic grammars", but otherwise not necessary. Note, we don't have any "official" tools to split and combine grammars but Trash does this easily preserving off-channel content. If there are new import/include rules on grammars, Antlr4 grammars that did work would involve a change when coverted to Antlr5. For example, the "import LexBasic;" would need to be changed to "include LexBasic;". |
Beta Was this translation helpful? Give feedback.
-
What if, following the lexer mode pattern, entire sections of the main document were delegated to sub-grammars ?
Something like:
```
.../...
javascript: SCRIPT_TAG_OPEN delegate JavaScriptParser sourceElements SCRIPT_TAG_CLOSE;
.../...
```
In the above, the HTMLParser would consume SCRIPT_TAG_OPEN, accumulate all input up to SCRIPT_TAG_CLOSE, and delegate the lexing and parsing of the accumulated input to a JavaScriptParser, invoking the sourceElements rule.
This would involve introducing a new node type i.e. DelegateNode, that would hold a reference to the autonomous javascript parse tree...
(just thinking out loud here...)
|
Beta Was this translation helpful? Give feedback.
-
Grammars can range from self contained "combined" grammars, "lexer" grammars, "parser" grammars (that require a "lexer" grammar, and more complex grammars that import other grammars. The latter are extremely useful for high level languages: linq in C#, jsx in js, js and css in html...
The current mechanism for wiring grammars together relies on the
import
statement. Its behavior is not well documented.Experience shows the following:
Behavior 1 is highly desirable. It operates in an intuitive way. However the tool could raise a warning in case of collision, and not raise it if the overriding rule is annotated (for example using a
@override
decorator)Behavior 2 is fine for parser grammars, but severely problematic for lexer ones. This is because top-level parser rules to not have precedence, whereas the precedence of lexer rules is determines by their sequence in the grammar.
This makes it very difficult to re-use grammar fragments across lexers, because a well-formed lexer grammar typically ends with a whitespace rule followed by a catch-all one. Hence, imported lexer rules often end up never being matched.
To avoid this, we could adopt the following:
include
statement that can be placed anywhere in the grammar, thus imperatively specifying where the lexer rules should be placedBeta Was this translation helpful? Give feedback.
All reactions