Skip to content

Markdown writer does not create bracketed span when Span has empty Attrs #10802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rnwst opened this issue Apr 24, 2025 · 8 comments
Open

Markdown writer does not create bracketed span when Span has empty Attrs #10802

rnwst opened this issue Apr 24, 2025 · 8 comments
Labels

Comments

@rnwst
Copy link
Contributor

rnwst commented Apr 24, 2025

Explain the problem.
test.md:

[bracketed span]{}

command:

pandoc test.md -f markdown -t markdown

output:

bracketed span

expected output:

[bracketed span]{}

When test.md is modified as follows:

[bracketed span]{.class}

(i.e. the Attr is non-empty), then the output is as expected:

[bracketed span]{.class}

It should also be noted that in case of an empty Attr, the Markdown parsing works as expected:

$ pandoc test.md -f markdown -t native
[ Para
    [ Span
        ( "" , [] , [] ) [ Str "bracketed" , Space , Str "span" ]
    ]
]

Therefore, this is an issue with the Markdown writer.

Pandoc version?
3.6.4

@rnwst rnwst added the bug label Apr 24, 2025
rnwst added a commit to rnwst/pandoc-lua-crossrefs that referenced this issue Apr 24, 2025
@jgm
Copy link
Owner

jgm commented Apr 24, 2025

I think there may have been a reason for this. If we emitted an explicit span here, we'd get ugly results from some LaTeX input; LaTeX grouping {...} is parsed as a Span with empty attributes. Not sure if that was the thinking, but I'm pretty sure there was a reason of some kind...

Why is it important to emit a span with empty attributes?

rnwst added a commit to rnwst/pandoc-lua-crossrefs that referenced this issue Apr 24, 2025
Turns out the workaround for jgm/pandoc#10802 was incomplete and falls
down when there are nested Inlines, where the lower level has at least
one equation with an empty Attr. Remove temporary class after all
equation Attrs have been parsed instead.
@rnwst
Copy link
Contributor Author

rnwst commented Apr 24, 2025

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

This behaviour of the Markdown writer described above resulted in a bug, because during this round-trip conversion from AST to Markdown and back to AST, Spans with empty Attrs were being removed from the AST. When I wrote the code, I expected this round-trip conversion to preserve the AST, provided only AST elements are used which are supported in Markdown.

I understand that it is not reasonable to expect round-trip conversions for any format to result in no changes, even if all document elements for a given format have corresponding AST representations, since there are many ways the same AST structure can be represented in any given format. However, for AST round-trip conversions, I think it would be a worthwhile aim to make round trip conversions exact, provided the intermediate format has support for all AST elements used.

@jgm
Copy link
Owner

jgm commented Apr 24, 2025

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

I didn't have time to look at the commits, but my first question is why? Why not manipulate the AST directly?

@rnwst
Copy link
Contributor Author

rnwst commented Apr 24, 2025

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

I didn't have time to look at the commits, but my first question is why? Why not manipulate the AST directly?

In the mentioned filter, I'm adding support for DisplayMath Attrs to pandoc's Markdown:

$$
E=mc^2
$${#id .unnumbered key=val}

In the AST, I'm taking all the Inlines that follow the DisplayMath, write them back to Markdown, prepend [] to the Markdown string, and then proceed to parse it as a bracketed span to obtain the Attr. (Letting pandoc do the parsing of the Attr saves me from having to write a complicated regex which likely wouldn't parse Attrs the same way pandoc does.) The DisplayMath element is then wrapped in a Span with the obtained Attr.

(In my earlier comment I didn't want to provide so many details about this use case, since it is rather specific and I thought it might distract from the broader discussion of whether AST round-trip conversions should be exact).

@jgm
Copy link
Owner

jgm commented Apr 25, 2025

So in this case you don't actually need an empty span. You could use [x] instead for the same purpose.

@rnwst
Copy link
Contributor Author

rnwst commented Apr 25, 2025

The empty Span is not the problem, that part is working fine. The problem is that non-empty Spans with an empty Attr (which are located sometime after the DisplayMath element, but in the same Inlines List) are replaced with the Spans' contents during the round-trip conversion AST -> Markdown -> AST, thereby altering the AST in an unexpected way. I have implemented a workaround for this (I assign a temporary class to all Spans so that there are no empty Attrs, and then remove that class from all Spans in a later filter function once all DisplayMath Attrs have been parsed -- not very elegant, but it works).

However, I feel like we are deviating from the original question, which is whether it is reasonable to expect AST round-trip conversions (AST -> some format -> AST) to be exact, provided the intermediate format has full support for all AST elements used. As mentioned, to me this seems like a worthwhile aim. @jgm, do you have any thoughts on that?

Regarding your original comment, to find out why the Markdown writer behaves this way, perhaps we could change the behaviour to see what tests fail?

@jgm
Copy link
Owner

jgm commented Apr 25, 2025

Ah, well then my question is why you are using spans with empty attributes? I'm struggling to see why one would want to write such a thing in Markdown.

About the other question: yes, ideally it would round-trip, but as I said above, I think there may be a reason for omitting the spans with empty attributes. We could try seeing what tests fail as you suggest.

@rnwst
Copy link
Contributor Author

rnwst commented Apr 25, 2025

Regarding your question, it was easier to wrap all DisplayMath elements in Spans, even if they have no Attr, because then a subsequent filter function that numbers equations always receives a Span, rather than a Span or a Math depending on whether there are attributes.

Thank you for your comments regarding the round-tripping. I'll have a go at making this change to see what tests will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants