Skip to content

Commit 6113aad

Browse files
committed
Don't precompute placeholder replacements in raw HTML post-processor
Previously, the raw HTML post-processor would precompute all possible replacements for placeholders in a string, based on the HTML stash. It would then apply a regular expression substitution using these replacements. Finally, if the text changed, it would recurse, and do all that again. This was inefficient because placeholders were re-computed each time it recursed, and because only a few replacements would be used anyway. This change moves the recursion into the regular expression substitution, so that: 1. the regular expression does minimal work on the text (contrary to re-scanning text already scanned in previous frames); 2. but more importantly, replacements aren't computed ahead of time anymore (and even less *several times*), and only fetched from the HTML stash as placeholders are found in the text. The substitution function relies on the regular expression groups ordering: we make sure to match `<p>PLACEHOLDER</p>` first, before `PLACEHOLDER`. The presence of a wrapping `p` tag indicates whether to wrap again the substitution result, or not (also depending on whether the substituted HTML is a block-level tag). Issue-1507: #1507
1 parent 85a9160 commit 6113aad

File tree

1 file changed

+14
-25
lines changed

1 file changed

+14
-25
lines changed

markdown/postprocessors.py

+14-25
Original file line numberDiff line numberDiff line change
@@ -73,37 +73,26 @@ class RawHtmlPostprocessor(Postprocessor):
7373

7474
def run(self, text: str) -> str:
7575
""" Iterate over html stash and restore html. """
76-
replacements = OrderedDict()
77-
for i in range(self.md.htmlStash.html_counter):
78-
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[i])
79-
if self.isblocklevel(html):
80-
replacements["<p>{}</p>".format(
81-
self.md.htmlStash.get_placeholder(i))] = html
82-
replacements[self.md.htmlStash.get_placeholder(i)] = html
83-
8476
def substitute_match(m: re.Match[str]) -> str:
85-
key = m.group(0)
86-
87-
if key not in replacements:
88-
if key[3:-4] in replacements:
89-
return f'<p>{ replacements[key[3:-4]] }</p>'
90-
else:
91-
return key
92-
93-
return replacements[key]
94-
95-
if replacements:
77+
if key := m.group(1):
78+
wrapped = True
79+
else:
80+
key = m.group(2)
81+
wrapped = False
82+
if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks):
83+
return m.group(0)
84+
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[key])
85+
if self.isblocklevel(html) or not wrapped:
86+
return pattern.sub(substitute_match, html)
87+
return pattern.sub(substitute_match, f"<p>{html}</p>")
88+
89+
if self.md.htmlStash.html_counter:
9690
base_placeholder = util.HTML_PLACEHOLDER % r'([0-9]+)'
9791
pattern = re.compile(f'<p>{ base_placeholder }</p>|{ base_placeholder }')
98-
processed_text = pattern.sub(substitute_match, text)
92+
return pattern.sub(substitute_match, text)
9993
else:
10094
return text
10195

102-
if processed_text == text:
103-
return processed_text
104-
else:
105-
return self.run(processed_text)
106-
10796
def isblocklevel(self, html: str) -> bool:
10897
""" Check is block of HTML is block-level. """
10998
m = self.BLOCK_LEVEL_REGEX.match(html)

0 commit comments

Comments
 (0)