Skip to content

Parser size testing (arrows) #144

@ChrHorn

Description

@ChrHorn

I was doing some testing to bring down the parser size. Tried deleting some of the operators and noticed there is a significant difference when changing the arrow operators.

Case 1 (master, f1baa5f)

  arrow: `
    <-- --> <-->
    ← → ↔ ↚ ↛ ↞ ↠ ↢ ↣ ↦ ↤ ↮ ⇎ ⇍ ⇏ ⇐ ⇒ ⇔ ⇴ ⇶ ⇷ ⇸ ⇹ ⇺ ⇻ ⇼ ⇽ ⇾ ⇿ ⟵ ⟶ ⟷ ⟹ ⟺ ⟻ ⟼ ⟽ ⟾ ⟿
    ⤀ ⤁ ⤂ ⤃ ⤄ ⤅ ⤆ ⤇ ⤌ ⤍ ⤎ ⤏ ⤐ ⤑ ⤔ ⤕ ⤖ ⤗ ⤘ ⤝ ⤞ ⤟ ⤠ ⥄ ⥅ ⥆ ⥇ ⥈ ⥊ ⥋ ⥎ ⥐ ⥒ ⥓ ⥖ ⥗ ⥚ ⥛ ⥞
    ⥟ ⥢ ⥤ ⥦ ⥧ ⥨ ⥩ ⥪ ⥫ ⥬ ⥭ ⥰ ⧴ ⬱ ⬰ ⬲ ⬳ ⬴ ⬵ ⬶ ⬷ ⬸ ⬹ ⬺ ⬻ ⬼ ⬽ ⬾ ⬿ ⭀ ⭁ ⭂ ⭃ ⥷ ⭄ ⥺ ⭇ ⭈ ⭉
    ⭊ ⭋ ⭌ ← → ⇜ ⇝ ↜ ↝ ↩ ↪ ↫ ↬ ↼ ↽ ⇀ ⇁ ⇄ ⇆ ⇇ ⇉ ⇋ ⇌ ⇚ ⇛ ⇠ ⇢ ↷ ↶ ↺ ↻
  `,
❯ du -sh src/parser.c
49M     src/parser.c

❯ cat src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 19881
#define LARGE_STATE_COUNT 9618

Case 2 (https://github.com/ChrHorn/tree-sitter-julia/commit/e66d1bf1a73e4e42e86a70830e0d02c2016cc92d)

Deleted most of the arrow operators.

  arrow: `
   <-- --> <-->
   ← → ↔
  `,

No visible change in states and parser size.

❯ du -sh src/parser.c
49M     src/parser.c

❯ cat src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 19881
#define LARGE_STATE_COUNT 9618

Case 3 (https://github.com/ChrHorn/tree-sitter-julia/commit/e64b8fcfd7fcc78fdfeacd54b145ab265367799f)

Notice the only difference to Case 2 is the one deleted arrow operator.

  arrow: `
    <-- --> <-->
    ← →
  `,

Leads to a pretty significant reduction in states and parser size.

❯ du -sh src/parser.c
32M     src/parser.c

❯ cat src/parser.c | rg "#define.*STATE"
#define STATE_COUNT 12760
#define LARGE_STATE_COUNT 5869

Not really sure what's going on. I don' think it's Unicode, for example

  arrow: `
    ⥟ ⥢ ⥤ ⥦ ⥧ ⥨ ⥩ ⥪ ⥫ ⥬ ⥭ ⥰ ⧴ ⬱ ⬰ ⬲ ⬳ ⬴ ⬵ ⬶ ⬷ ⬸ ⬹ ⬺ ⬻ ⬼ ⬽ ⬾ ⬿ ⭀ ⭁ ⭂ ⭃ ⥷ ⭄ ⥺ ⭇ ⭈ ⭉
  `,

also results in a smaller parser. I also only noticed this behavior when changing the arrow operators. The change is always binary (either smaller, or current larger parser size), nothing in between.

@savq are you able to reproduce this on your end, any idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions