Markdown writer does not create bracketed span when Span has empty Attrs #10802

rnwst · 2025-04-24T13:33:11Z

Explain the problem.
test.md:

[bracketed span]{}

command:

pandoc test.md -f markdown -t markdown

output:

bracketed span

expected output:

[bracketed span]{}

When test.md is modified as follows:

[bracketed span]{.class}

(i.e. the Attr is non-empty), then the output is as expected:

[bracketed span]{.class}

It should also be noted that in case of an empty Attr, the Markdown parsing works as expected:

$ pandoc test.md -f markdown -t native
[ Para
    [ Span
        ( "" , [] , [] ) [ Str "bracketed" , Space , Str "span" ]
    ]
]

Therefore, this is an issue with the Markdown writer.

Pandoc version?
3.6.4

The text was updated successfully, but these errors were encountered:

Implement workaround for jgm/pandoc#10802.

jgm · 2025-04-24T17:12:13Z

I think there may have been a reason for this. If we emitted an explicit span here, we'd get ugly results from some LaTeX input; LaTeX grouping {...} is parsed as a Span with empty attributes. Not sure if that was the thinking, but I'm pretty sure there was a reason of some kind...

Why is it important to emit a span with empty attributes?

Turns out the workaround for jgm/pandoc#10802 was incomplete and falls down when there are nested Inlines, where the lower level has at least one equation with an empty Attr. Remove temporary class after all equation Attrs have been parsed instead.

rnwst · 2025-04-24T20:12:02Z

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

This behaviour of the Markdown writer described above resulted in a bug, because during this round-trip conversion from AST to Markdown and back to AST, Spans with empty Attrs were being removed from the AST. When I wrote the code, I expected this round-trip conversion to preserve the AST, provided only AST elements are used which are supported in Markdown.

I understand that it is not reasonable to expect round-trip conversions for any format to result in no changes, even if all document elements for a given format have corresponding AST representations, since there are many ways the same AST structure can be represented in any given format. However, for AST round-trip conversions, I think it would be a worthwhile aim to make round trip conversions exact, provided the intermediate format has support for all AST elements used.

jgm · 2025-04-24T20:14:42Z

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

I didn't have time to look at the commits, but my first question is why? Why not manipulate the AST directly?

rnwst · 2025-04-24T20:43:32Z

I came across this behaviour when writing a pandoc filter in which I convert parts of the AST back to Markdown, manipulate the Markdown, and then parse the Markdown to go back to the AST (see above commits for further details).

I didn't have time to look at the commits, but my first question is why? Why not manipulate the AST directly?

In the mentioned filter, I'm adding support for DisplayMath Attrs to pandoc's Markdown:

$$
E=mc^2
$${#id .unnumbered key=val}

In the AST, I'm taking all the Inlines that follow the DisplayMath, write them back to Markdown, prepend [] to the Markdown string, and then proceed to parse it as a bracketed span to obtain the Attr. (Letting pandoc do the parsing of the Attr saves me from having to write a complicated regex which likely wouldn't parse Attrs the same way pandoc does.) The DisplayMath element is then wrapped in a Span with the obtained Attr.

(In my earlier comment I didn't want to provide so many details about this use case, since it is rather specific and I thought it might distract from the broader discussion of whether AST round-trip conversions should be exact).

jgm · 2025-04-25T01:05:45Z

So in this case you don't actually need an empty span. You could use [x] instead for the same purpose.

rnwst · 2025-04-25T10:03:05Z

The empty Span is not the problem, that part is working fine. The problem is that non-empty Spans with an empty Attr (which are located sometime after the DisplayMath element, but in the same Inlines List) are replaced with the Spans' contents during the round-trip conversion AST -> Markdown -> AST, thereby altering the AST in an unexpected way. I have implemented a workaround for this (I assign a temporary class to all Spans so that there are no empty Attrs, and then remove that class from all Spans in a later filter function once all DisplayMath Attrs have been parsed -- not very elegant, but it works).

However, I feel like we are deviating from the original question, which is whether it is reasonable to expect AST round-trip conversions (AST -> some format -> AST) to be exact, provided the intermediate format has full support for all AST elements used. As mentioned, to me this seems like a worthwhile aim. @jgm, do you have any thoughts on that?

Regarding your original comment, to find out why the Markdown writer behaves this way, perhaps we could change the behaviour to see what tests fail?

jgm · 2025-04-25T15:10:18Z

Ah, well then my question is why you are using spans with empty attributes? I'm struggling to see why one would want to write such a thing in Markdown.

About the other question: yes, ideally it would round-trip, but as I said above, I think there may be a reason for omitting the spans with empty attributes. We could try seeing what tests fail as you suggest.

rnwst · 2025-04-25T16:05:46Z

Regarding your question, it was easier to wrap all DisplayMath elements in Spans, even if they have no Attr, because then a subsequent filter function that numbers equations always receives a Span, rather than a Span or a Math depending on whether there are attributes.

Thank you for your comments regarding the round-tripping. I'll have a go at making this change to see what tests will fail.

rnwst added the bug label Apr 24, 2025

rnwst added a commit to rnwst/pandoc-lua-crossrefs that referenced this issue Apr 24, 2025

Fix yet another bug in parse_equation_attr

84e1a5c

Implement workaround for jgm/pandoc#10802.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Markdown writer does not create bracketed span when Span has empty Attrs #10802

Markdown writer does not create bracketed span when Span has empty Attrs #10802

rnwst commented Apr 24, 2025

jgm commented Apr 24, 2025

Uh oh!

rnwst commented Apr 24, 2025

Uh oh!

jgm commented Apr 24, 2025

Uh oh!

rnwst commented Apr 24, 2025

Uh oh!

jgm commented Apr 25, 2025

Uh oh!

rnwst commented Apr 25, 2025

Uh oh!

jgm commented Apr 25, 2025

Uh oh!

rnwst commented Apr 25, 2025

Uh oh!

Uh oh!

Markdown writer does not create bracketed span when Span has empty Attrs #10802

Markdown writer does not create bracketed span when Span has empty Attrs #10802

Comments

rnwst commented Apr 24, 2025

jgm commented Apr 24, 2025

Uh oh!

rnwst commented Apr 24, 2025

Uh oh!

jgm commented Apr 24, 2025

Uh oh!

rnwst commented Apr 24, 2025

Uh oh!

jgm commented Apr 25, 2025

Uh oh!

rnwst commented Apr 25, 2025

Uh oh!

jgm commented Apr 25, 2025

Uh oh!

rnwst commented Apr 25, 2025

Uh oh!