Releases: jgm/pandoc
pandoc 2.17.1
Click to expand changelog
-
Support
pagedjs-cli
as pdf engine (#7838, Albert Krewinkel). PagedJS is a polyfill and supports the Paged Media standards by the W3C. https://www.pagedjs.org/ -
CommonMark reader: fix source position after YAML metadata (#7863).
-
LaTeX reader:
-
Remove retokenizing in
rawLaTeXParser
. -
Ensure that
\raggedright
doesn’t gobble an argument (#7757). -
Improve
descItem
. For some reason we were skipping arbitrary blocks before\item
. This is now changed to “skip whitespace and comments.” -
Improve handling of
\newif
. Adding a pair of braces around the second argument of\def
prevents LaTeX from an emergency stop on input like the following (#6096).\newif\ifepub \epubtrue \ifepub hi \fi
-
-
Docx reader: Parse both Zotero citation and bibliography as
FieldInfo
(#7840). -
LaTeX writer:
-
Markdown writer: handle explicit column widths with pipe tables (#7847). If a table has explicit column width information and the content extends beyond the
--columns
width, we need to adjust the widths of the pipe separators to encode this width information. -
Docx writer: Separate tables even with RawBlocks between (#7224, Michael Hoffmann). Adjacent docx tables need to be separated by an empty paragraph. If there’s a RawBlock between tables which renders to nothing, be sure to still insert the empty paragraph so that they will not collapse together.
-
Man writer: use custom font V for inline code (#7506). The V font is defined conditionally, so that it renders like CB in output formats that support that, and like B in those that don’t (e.g. the terminal). Aliases also defined for VI, VB, VBI.
-
Asciidoc writer: Support checklists in asciidoctor writer (#7832, Nikolai Korobeinikov, ricnorr). The checklist syntax (similar to
task_list
in markdown) seems to be an asciidoctor-only addition. -
HTML writer:
-
Custom writer: preserve order of element attributes (#7489, Albert Krewinkel). Attribute key-value pairs are marshaled as AttributeList, i.e., as a userdata type that behaves both like a list and a map. This allows to preserve the order of key-value pairs.
-
Switch to hslua-2.1 (Albert Krewinkel). This allows for some code simplification and improves stability.
-
Don’t read files outside of user data directory (Even Brenden). If a file path does not exist relative to the working directory, and it does exist relative to the user data directory, but outside of of the user data directory, do not read it. This applies to
readDataFile
andreadMetadataFile
in PandocMonad and, by extension, any module that uses these by passing them relative paths. -
Text.Pandoc.Class.
makeCanonical
: Correctly handle consecutive “..”s at the beginning of a path (Even Brenden). Prior to this commit,../../file
would evaluate tofile
, when it should be unchanged. -
Search for metadata files in
$DATADIR/metadata
(#7851, Even Brenden). If files specified with--metadata-file
are not found in the working directory, look in$DATADIR/metadata
(#5876). -
Text.Pandoc.Class: export
readMetadataFile
[API change] (#5876). -
Text.Pandoc.Error: export new
PandocCouldNotFindMetadataFileError
constructor forPandocError
[API change] (#5876). -
Avoid putting a frame around speaker notes in beamer (#7857). If speaker notes (a Div with class ‘notes’) occur right after a section heading, but above slide level, the resulting
\note{..}
caommand should not be wrapped in a frame, as that will cause a spurious blank slide. -
CSS in HTML template: adjust #TOC and h1 on mobile (#7835, Mauro Bieg).
-
Text.Pandoc.Readers.LaTeX.Parsing: don’t export
totoks
. Make the first param oftokenize
a SourcePos instead of SourceName, and use it instead oftotoks
. -
Text.Pandoc.Shared: Modify
stringify
so it ignores[Citation]
insideCite
(#7855). Otherwise we’ll sometimes get two copies of things, one from thecitationPrefix
orcitationSuffix
and another from the embedded fallback text. When there is no fallback text, we’ll get no content. However, it really isn’t an alternative to just rely on the result of runningquery
on the embeddedCitation
s; this will result in a jumble of text rather than anything structured. -
Omit
--enable-doc
in the cabal haddock invocation intools/build-and-upload-api-docs.sh
. -
Text.Pandoc.App.Opt: fix logic bug in
fullDefaultsPath
. Previously we would (also) search the default user data directory for a defaults file, even if a different user data directory was specified using--data-dir
. This was a mistake; if--data-dir
is used, the default user data directory should not be searched. -
Text.Pandoc.Shared:
defaultUserDataDir
behavior change (#7842). If the XDG data directory is not defined (e.g. because it’s not supported in the OS or HOME isn’t defined), we return the empty string instead of raising an exception. -
Update command tests to distinguish stderr and test exit status.
-
MANUAL: add that speaker notes can be used with beamer (#7856).
-
Update
build-and-upload-api-docs.sh
. -
Document
--trace
option. Documentno-check-certificate
in defaults files. Document ‘sandbox’ option for defaults files. (#7873). -
Fix pattern syntax in sample readability custom reader.
-
doc/custom-readers.lua: add example for “readable HTML.”
-
Fix message in man page about where code can be found.
-
manfilter.lua
: remove extra indent in table cells with code blocks. -
Fix lua-filters documentation for table column widths (#7864).
-
epub.doc: Update links to KindleGen (#7846, Benson Muite, Mauro Bieg). KindleGen has been deprecated and we need to link to archived versions.
-
Use tables in defaults files documentation, so each default option is paired with the corresponding command-line option (Carsten Allefeld).
-
Use skylighting 0.12.2.
-
Add pandoc-lua-marshal to Nix shell (#7849, Even Brenden).
2.17.0.1
Click to expand changelog
-
Require pandoc-lua-marshal 0.1.3.1 (#7831, Albert Krewinkel). Fixes a problem with
List.includes
andList.find
that caused a Lua stackoverflow and subsequent program crash. -
HTML template: load header-includes before math (#7833, Kolen Cheung). MathJax expect the config comes before loading the MathJax script. This change of order allows one to config MathJax via header-includes, which loads before the MathJax script. Cf. #2750.
-
When reading defaults file, stop at a line
...
. This line signals the end of a YAML document. This restores the behavior we got with HsYaml. yaml complains about content past this line. See #4627 (comment) -
Text.Pandoc.Citeproc: allow
notes-after-punctuation
to work with numerical styles that use superscripts (e.g. american-medical-association.csl), as well as with note styles. The default setting ofnotes-after-punctuation
is true for note styles and false otherwise. This restores a behavior of pandoc-citeproc that wasn’t properly carried over to Citeproc (#7826, cf. jgm/pandoc-citeproc#384). -
Use commonmark-pandoc 0.2.1.2 (#7769).
-
Add FAQ on images in ipynb containers (#7749, Kolen Cheung).
pandoc 2.17
Click to expand changelog
-
Support
markua
as an output format (#1871, Tim Wisotzki and Saumel Lemmenmeier). Markua is a markdown variant used by Leanpub. -
Add text wrapping for HTML output (#7764). Previously the HTML writer was exceptional in not being sensitive to the
--wrap
option. With this change--wrap
now works for HTML. The default (as with other formats) is automatic wrapping. Note that the contents ofscript
,textarea
, andpre
tags are always laid out with theflush
combinator, so that unwanted spaces won’t be introduced if these occur in an indented context in a template. -
Don’t read sources until in/out format are verified (#7797).
-
Issue error with
--list-extensions
for invalid formats (#7797). -
Make
--citeproc
recognize.yml
as well as.yaml
extensions as YAML bibliography files (#7707, Jörn Krenzer). -
Use latest version of KaTeX with
--katex
. -
Fix parsing of footnotes in
--metadata-file
(#7813). Previously non-inline footnotes were not being parsed. -
ODT reader:
- Parse list-header as a list item (Tuong Nguyen Manh).
-
Commonmark reader:
- Put sourcepos attribute on header, not enclosing div with
-f commonmark+sourcepos
(#7769).
- Put sourcepos attribute on header, not enclosing div with
-
Markdown reader:
- Don’t allow
^
at beginning of link or image label (#7723). This is reserved for footnotes. Fixes regression from 0a93acf. - Fix parsing of “bare locators” after author-in-text citations. Previously
@item [p. 12; @item2]
was incorrectly parsed as three citations rather than two. This is now fixed by ensuring thatprefix
doesn’t gobble any semicolons. - Revert changes to
inlinesInBalancedBrackets
(commit fa83246), which caused regressions. - Improve detection of pipe table line widths (#7713). Fixed calculation of maximum column widths in pipe tables. It is now based on the length of the markdown line, rather than a “stringified” version of the parsed line. This should be more predictable for users. In addition, we take into account double-wide characters such as emojis.
- Don’t allow
-
Custom (Lua) readers:
- First argument is now a list of sources instead of the concatenated text (Albert Krewinkel). The list structure can easily be converted to a string by applying
tostring
, but it is also possible to access the elements (each with atext
andname
). A small example is added to the custom reader documentation, showcasing its use in a reader that creates a syntax-highlighted code block for each source code file passed as input. Existing readers will still work through a fallback mechanism, issuing a deprecation notice.
- First argument is now a list of sources instead of the concatenated text (Albert Krewinkel). The list structure can easily be converted to a string by applying
-
Org reader:
- Parse official org-cite citations (#7329). We also support the older org-ref style as a fallback. We no longer support the “markdown style” or “Berkeley style” citations.
- Support alphabetical (fancy) lists (Lucas Viana). When the
fancy_lists
extension is enabled, alphabetical list markers are allowed, mimicking the behaviour of Org Mode whenorg-list-allow-alphabetical
is enabled. - Support counter cookies in lists (Lucas Viana). Such cookies are used to override the item counter in ordered lists. In org it is possible to set the counter at any list item, but since Pandoc AST does not support this, we restrict the usage to setting an offset for the entire ordered list, by using the cookie in the first list item.
- Allow trailing spaces after key/value pairs in directives (Albert Krewinkel). Ensures that spaces at the end of attribute directives like
#+ATTR_HTML: :width 100%
(note the trailing spaces) are accepted.
-
LaTeX reader:
- Omit visible content for
\label{...}
. Previously we included the text of the label in square brackets, but this is undesirable in many cases. See discussion in #813 (comment). - Improve references (#813). Resolve references to theorem environments. Remove the Span caused by “label” in figure, table, and theorem environments; this had an id that duplicated the environments’ id.
- Fix semantics of
\ref
. We were including the ams environment type in addition to the number. This is proper behavior for\cref
but not for\ref
. To support\cref
we need to store the environment label separately. - Add babel mappings for Guajati (gu) and Oriya (or) (#7815).
- Fix typo
panjabi
->punjabi
in babel mappings (#7814).
- Omit visible content for
-
HTML reader:
- Parse attributes on links and images (#6970).
-
Docx reader:
- Handle multiple pic elements inside a drawing (#7786).
- Change
elemToParPart
to return[ParPart]
instead ofParPart
. Also removeNullParPart
constructor, as it is no longer needed. This will allow us to handle elements that contain multiple ParParts, e.g.w:drawing
elements with multiplepic:pic
.
-
DocBook reader:
-
Markdown writer:
- Add new exported function
writeMarkua
from Text.Pandoc.Writers.Markdown [API change] (#1871, Tim Wisotzki and Saumel Lemmenmeier). - Fix indentation issue in footnotes (#7801).
- Avoid extra space before citation suffix if it already starts with a space.
- Ensure semicolon between the locator and the next citation when an author-in-text citation has a locator and following citations.
- Improve escaping for
#
(#7726).
- Add new exported function
-
Custom (Lua) writers:
-
Allow variables to be set via second return value of
Doc
(#6731, Albert Krewinkel). New templates variables can be added by giving variable-value pairs as a second return value of the global functionDoc
. Example:function Doc (body, meta, vars) vars.date = vars.date or os.date '%B %e, %Y' return body, vars end
-
Provide global
PANDOC_WRITER_OPTIONS
(#6731, Albert Krewinkel). -
Assign default Pandoc object to global
PANDOC_DOCUMENT
(Albert Krewinkel). The default Pandoc object is now non-strict, i.e., only the parts of the document that are accessed will be marshaled to Lua. A special type is no longer necessary. This change also makes it possible to use the global variable with library functions such aspandoc.utils.references
, or to inspect the document contents withwalk()
.
-
-
LaTeX writer:
- Fix typo
panjabi
->punjabi
in babel mappings (#7814).
- Fix typo
-
MediaWiki writer:
- Remove redundant display text for wiki links (Jesse Hathaway).
-
Docx writer:
- Handle bullets correctly in lists by not reusing numIds (#7689, Michael Hoffmann). This fixes a bug in which a Div in a list item would receive bullets on its contained paragraphs.
-
Org writer:
- Fix list items starting with a code block or other non-paragraph content (#7810).
- Avoid blank lines after tight sublists (#7810).
- Fix extra blank line inserted after empty list item (#7810).
- Don’t add blank line before lists (#7810).
- Support starting number cookies (Lucas Viana). This is necessary for lists that start at a number other than 1.
- Support the new org-cite syntax (#7329).
-
Haddock writer:
- Avoid blank lines after tight sublists (#7810).
-
Ipynb writer:
- Ensure deterministic order of keys.
- Handle cell output with raw block of markdown (#7563, Kolen Cheung). Write RawBlock of markdown in code-cell output. This is designed to fit the behavior of #7561, which makes the ipynb reader parse code-cell output with mime “text/markdown” to a RawBlock of markdown. This commit makes the ipynb writer writes this RawBlock of markdown back inside a code-cell output with the same mime, preserving this information in round-trip.
- In choosing between multiple output options, always favor those marked with the output format over images (Kolen Cheung). Previously, both
fmt == f
case and Image have a rank of 1.
-
Ipynb reader & writer: properly handle cell “id” (#7728). This is passed through if it exists (in Nb4); otherwise the writer will add a random one so that all cells have an “id”.
-
Ms writer:
- Properly encode strings for PDF contents (#7731).
-
JATS writer:
- Keep quotes in element-citations (Albert Krewinkel). Fixed a bug that lead to quote characters being lost in element-citations.
-
RTF writer:
- Properly handle images in data URIs (#7771).
-
Commonmark writer:
- Allow ‘)’ delimiters on ordered lists.
-
RST writer:
- Avoid extra blank line after empty list item (#7810).
-
HTML writer:
- Make line breaks more consistent. With
--wrap=none
, we now output line breaks between block-level elements. Previously they were omitted entirely, so the whole document was on one line, unless there were literal line breaks in pre sections. This makes the HTML writer’s behavior more consistent with that of other writers. Also, regardless of wrap settings, put newline after<dd>
and after block-level elements in the footnotes section. And add a line break between animg
tag and the associatedfigcaption
. - reveal.js: Make sure images with
r-stretch
are not in p tags. They must be direct children of the section. There was previously code to make this work with the older class namestretch
, but the name has changed in reveal.js. - reveal.js: don’t add
r-fit-text
class to section. It must go on the header only.
- Make line breaks more consistent. With
-
AsciiDoc writer:
- Improve detection of intraword emphasis (#7803).
-
OpenDocument writer:
- Fix vertical alignment bug with...
pandoc 2.16.2
Click to expand changelog
-
Add interface for custom readers written in Lua (#7669). Users can now do
-f myreader.lua
and pandoc will treat the scriptmyreader.lua
as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found indata/creole.lua
. Also see documentation indoc/custom-readers.md
. -
New module Text.Pandoc.Readers.Custom, exporting
readCustom
[API change]. -
Allow
plain
to be used in raw attribute syntax. -
Accept empty
--metadata-file
(#7675). This was a regression from 2.15 behavior. -
Markdown reader: Improve
inlinesInBalancedBrackets
. This is just a small improvement in terms of performance, but it’s simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did. -
BibTeX reader: Properly handle commented lines in BibTeX/BibLaTeX (#7668).
-
RST reader: handle class attribute for for custom roles (#7699, willj-dev). Previously the class attribute was ignored, and the name of the role used as the class.
-
DocBook reader:
- Add
<titleabbr>
support (Rowan Rodrik van der Molen). - Support for
<indexterm>
(#7607, Rowan Rodrik van der Molen).
- Add
-
LaTeX reader:
-
JATS reader: Capture
alt-text
in figures (#7703, Aner Lucero). -
MediaWiki writer: use HTML spans for anchors when header has id (#7697). We need to generate a span when the header’s ID doesn’t match the one MediaWiki would generate automatically. Note that MediaWiki’s generation scheme is different from pandoc’s (it uses uppercase letters, and
_
instead of-
, for example). This means that in going from markdown to mediawiki, we’ll now get spans before almost every heading, unless explicit identifiers are used that correspond to the ones MediaWiki auto-generates. This is uglier output but it’s necessary for internal links to work properly. -
Markdown writer: don’t create autolinks when this loses information (#7692). Previously we sometimes lost attributes when rendering links as autolinks.
-
Text.Pandoc.Readers.Metadata: allow multiple YAML documents when parsing YAML for
yamlBsToRefs
. Some people use---
as the end delimiter in YAML bibliography files, which causes theyaml
library to emit an error unless we explicitly allow multiple YAML documents (and just consider the first). -
JATS writer:
- Ensure figures are wrapped with
<p>
in list items (Albert Krewinkel). This prevents the generation of invalid output. - Add URL to element citation entries (Albert Krewinkel). The URL of a reference, if present, is added in tag
<uri>
to element-citation entries.
- Ensure figures are wrapped with
-
HTML writer: Don’t create invalid
data-
attribute for empty attribute key (#7546). -
LaTeX writer:
- Babel mappings: use
ancientgreek
forgrc
. - With
-t latex-smart
, don’t generate\ldots
from ellipsis (#7674). Instead just use unicode ellipsis.
- Babel mappings: use
-
JATS template: fix
equal-contrib
attribute (Albert Krewinkel). The standard requires the value to be eitheryes
orno
, but is was set totrue
for authors who contributed equally. -
reveal.js template: Add
disableLayout
variable (Christophe Dervieux). -
Text.Pandoc.Error: sort errors in
handleError
by exit code (Albert Krewinkel). -
Text.Pandoc.Writers.Shared: Improve toLegacyTable (#7683, Christian Despres).
-
Lua subsystem:
-
Include lpeg module (#7649, Albert Krewinkel). Compiles the
lpeg
library (Parsing Expression Grammars For Lua) into the program. Package maintainers may choose to rely on package dependencies to make lpeg available, in which case they can compile the with the constraintlpeg +rely-on-shared-lpeg-library
.lpeg
andre
are always made available in global variables, without the need for arequire
. -
Set
lpeg
andre
as globals; allow shared lib access viarequire
. Thelpeg
andre
modules are loaded into globals of the respective name, but they are not necessarily registered as loaded packages. This ensures that- the built-in library versions are preferred when setting the globals,
- a shared library is used if pandoc has been compiled without
lpeg
, and - the
require
mechanism can be used to load the shared library if available, falling back to the internal version if possible and necessary.
-
Fix argument order in constructor
pandoc.Cite
(Albert Krewinkel). This restores the old behavior; argument order had been switched accidentally in pandoc 2.15. -
Add Pushable instance for
ReaderOptions
(Albert Krewinkel). -
Allow to pass custom reader options to
pandoc.read
as an optional third argument (#7656, Albert Krewinkel). The object can either be a table or a ReaderOptions value likePANDOC_READER_OPTIONS
. Creating new ReaderOptions objects is possible through the new constructorpandoc.ReaderOptions
. -
Display Pandoc values using their native Haskell representation (Albert Krewinkel).
-
Require latest hslua (2.0.1) (#7661, #7657, Albert Krewinkel). This fixes issues with
- misleading error messages when a required function parameter is omitted;
- absent properties still being listed in the output of
pairs
; and - alias accessing leading to errors instead of returning
nil
, e.g. with(pandoc.Str '').identifier
.
-
Add missing space in “package not found” message (#7658, Albert Krewinkel).
-
-
Update build files (#7696, Fabián Heredia Montiel). Drop old windows 32-bit constraints. Update cabal
tested-with
field to correspond toci.yml
matrix -
Remove unneeded package dependencies from benchmark target.
-
Require ghc >= 8.6, base >= 4.12. This allows us to get rid of the old custom prelude and some crufty cpp. But the primary reason for this is that conduit has bumped its base lower bound to 4.12, making it impossible for us to support lower base versions.
-
Require Cabal 2.4. Use wildcards to ensure that all pptx tests are included (#7677).
-
Update
bash_completion.tpl
(S.P.H.). -
Add
data/creole.lua
as sample custom reader. -
Add
doc/custom-readers.md
anddoc/custom-writers.md
. -
doc/lua-filters.md
: add section on global modules, including lpeg (Albert Krewinkel). -
MANUAL.txt
: update table of exit codes and corresponding errors (Albert Krewinkel). -
Use latest texmath.
pandoc 2.16.1
Click to expand changelog
-
Docx reader: don’t let first line indents trigger block quotes (#7655). This fixes a regression introduced in pandoc 2.15.
-
Docx writer: use
getTimestamp
for modification times in reference.docx (#7654). This ensures that whenSOURCE_DATE_EPOCH
is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds. -
Lua subsystem (Albert Krewinkel):
- Load module
pandoc.path
on startup (#7524). Previously the module always had to be loaded viarequire 'pandoc.path'
. - Fix typo in SoftBreak constructor.
- Re-add
content
property to Strikeout elements. Fixes a regression introduced in 2.15. - Be more forgiving when retrieving the Image
caption
property. Fixes a regression introduced in 2.15. - Display Attr values using their native Haskell representation.
- Allow omitting the 2nd parameter in pandoc.Code constructor. Fixes a regression introduced in 2.15 which required users to always specify an Attr value when constructing a Code element.
- Allow to compare, show Citation values. Comparisons of Citation values are performed in Haskell; values are equal if they represent the same Haskell value. Converting a Citation value to a string now yields its native Haskell string representation.
- Restore List behavior of MetaList (#7650). Fixes a regression introduced in 2.16 which had MetaList elements lose the
pandoc.List
properties. - Restore
content
property on Header elements. - Ensure Block elements have all expected properties.
- Ensure Inline elements have all expected properties.
- Load module
-
Allow tasty-bench 0.3.x.
pandoc 2.16
Click to expand changelog
-
Switch back from HsYAML to yaml for parsing YAML metadata (#6084). HsYAML is around 20 times slower in parsing large YAML bibliographies. In addition, HsYAML is not being actively maintained. This sets us back in our attempts to free ourselves from C dependencies (#4535). But I don’t see a good alternative until a faster pure Haskell parser is available. Notes:
- We’ve removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON instead of having a dedicated From/ToYAML class.)
- Unlike HsYAML (in the configuration we were using), yaml parses ‘Y’, ‘N’, ‘Yes’, ‘No’, ‘On’, ‘Off’ as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, ‘null’ is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string ‘null’). Quoting it will force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves in a more useful way when the YAML metadata contains escaping errors: instead of just failng silently and falling back to some other interpretation of the section, it raises a YAML parsing error.
-
Markdown writer: Ensure that special values are quoted in YAML metadata. These include “Y”, “yes”, “on”, and “off”, which are now (with yaml library) considered boolean values, as well as “null”.
-
Change JSON encodings of some types.
- For LineEnding use lowercase constructors, e.g.
crlf
,native
. - For HTMLSlideVariant use lowercase constructors.
- For ReaderOptions use e.g.
default-image-extension
instead ofreaderDefaultImageExtension
for field names. - For Extension, use e.g.
tex_math_dollars
instead ofExt_tex_math_dollars
as constructor. - For Extensions, use an array of Extensions, instead of an object wrapping the tag
Extensions
and an integer. (The integer representation is not supposed to be part of the public API.) - For Opt, use field names like
tab-stop
instead ofoptTabStop
.
- For LineEnding use lowercase constructors, e.g.
-
Docx writer:
- Add IDs to native_numbering test (Tristan Stenner).
- Move “:” out of the caption bookmark (Tristan Stenner). This is needed so that native references to the figure are included as “As seen in Figure X, it is…” instead of “As seen in [Figure: X, it is…”
-
Lua (Albert Krewinkel, except as noted):
-
Use hslua module abstraction where possible.
-
Fix placement of tests for Block elements in pandoc module tests
-
Increase strictness when getting attribute keys
-
Re-add
t
andtag
property to Attr values. Removal of these properties from Attr values was a regression. -
Fix
pandoc.utils.stringify
regression. Thepandoc.utils.stringify
function returned empty strings when called with a string argument. -
Fix a copy/paste bug in Lua marshalling code (John MacFarlane, #7639). This caused links to be changed to figures when Lua filters changed link properties.
-
Re-add
content
property to Link elements (#7647). This was a regression introduced in version 2.15. -
Generate constants in module pandoc programmatically.
-
Marshal SimpleTable, ListAttributes, Citation, and Block values as userdata objects. Properties of Block values are marshalled lazily, which generally improves performance considerably. Script users may also notice the following differences:
- Block element properties can no longer be accessed by numerical indexing of the
.c
field. The.c
property now serves as an alias for.content
, so some filter that used this undocumented method for property access may continue to work, while others will need to be updated and use proper property names. - The marshalled Block elements now have a
show
method, and a__tostring
metamethod. Both return the Haskell string representation of the element. - Block values now have the Lua type
userdata
instead oftable
.
- Block element properties can no longer be accessed by numerical indexing of the
-
-
Add a short guide to pandoc’s sources (Albert Krewinkel).
-
Fix epub files in epub reader tests, so that they are valid according to epubcheck (#7586).
-
Allow time 1.13.
-
Require latest skylighting (0.12.1).
-
Fix build on GHC 9.2 (Joseph C. Sible).
-
Fix trypandoc so it builds with aeson > 2.
pandoc 2.15
Click to expand changelog
-
Add
--sandbox
option (#5045).- Add sandbox feature. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system.
- Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App.
- Note that when
--sandboxed
is specified, readers won’t have access to the resource path, nor will anything have access to the user data directory.
-
--self-contained
: Fix bug that caused everything to be made a data URI (#7635, #7367). We only need to use data URIs in certain cases, but due to a bug they were being used always. -
Pandoc will now fall back to latin1 encoding for inputs that can’t be read as UTF-8. This is what it did previously for content fetched from the web and not marked as to content type. It makes sense to do the same for local files. In this case a
NotUTF8Encoded
warning will be issued, indicating that pandoc is interpreting the input as latin1. -
Markdown reader:
- Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse
[link to (@a)](url)
as a citation; similarly[(@a)]{#ident}
. This is undesirable. One should be able to use example references in citations, and even if@a
is not defined as an example reference,[@a](url)
should be a link containing an author-in-text citation rather than a normal citation followed by literal(url)
. - Fix interaction of
--strip-comments
and list parsing (#7521). Use of--strip-comments
was causing tight lists to be rendered as loose (as if the comment were a blank line). - Fix parsing bug for math in bracketed spans and links (#7623). This affects math with unbalanced brackets (e.g.
$(0,1]$
) inside links, images, bracketed spans. - Fix code blocks using
--preserve-tabs
(#7573). Previously they did not behave as the equivalent input with spaces would.
- Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse
-
DocBook reader:
- Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook
linenumbering="numbered"
on code blocks maps to thenumberLines
class internally.
- Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook
-
LaTeX reader:
- Implement siunitx v3 commands (#7614). We support
\unit
,\qty
,\qtyrange
, and\qtylist
as synonynms of\si
,\SI
,\SIrange
, and\SIlist
. - Properly handle
\^
followed by group closing (#7615). - Recognize that
\vadjust
sometimes takes “pre” (#7531). - Ignore (and gobble parameters of) CSLReferences environment (#7531). Otherwise we get the parameters as numbers in the output.
- Restrict
\endinput
to current file (Simun Schuster).
- Implement siunitx v3 commands (#7614). We support
-
RST reader: handle escaped colons in reference definitions (#7568).
-
HTML reader:
- Handle empty tbody element in table (#7589).
-
Ipynb reader (Kolen Cheung):
- Get cell output mime from
raw_mimetype
in addition toformat
. (format
is what the spec calls for, butraw_mimetype
is often used in practice; see jupyter/nbformat#229). - Add more formats that can be handled as “raw” cells.
- Fix mime type for
rst
. - Support
text/markdown
, which is now a supported mime type for raw output (#7561).
- Get cell output mime from
-
RTF reader:
- Support
\binN
for binary image data. - If doc begins with { … } only parse its contents. Some documents seem to have non-RTF (e.g. XML) material after the
{\rtf1 ... }
group. - Ignore
\pgdsc
group. Otherwise we get style names treated as test. - Better handling of
\*
and bookmarks. We now ensure that groups starting with\*
never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
- Support
-
Docx reader:
- Avoid blockquote when parent style has more indent (Milan Bracke). When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
- Fix handling of empty fields (Milan Bracke). Some fields only have an
instrText
and no content, Pandoc didn’t understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn’t. - Implement PAGEREF fields (Milan Bracke). These fields, often used in tables of contents, can be a hyperlink.
- Fix handling of nested fields (Milan Bracke). Fields delimited by
fldChar
elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. - Add placeholder for word diagram instead of just omitting it (Ezwal).
-
Org reader:
-
Docx writer:
- Make id used in
native_numbering
predictable (#7551). If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. This allows one to create a filter that adds a figure number with figure name, e.g.<w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t> </w:r></w:fldSimple>
. If an image lack an id, an id of the formref_fig1
is used.
- Make id used in
-
Ensure we have unique ids for
wp:docPr
andpic:cNvPr
elements (#7527, #7503). -
Handle SVG images (#4058). This change has several parts:
- In Text.Pandoc.App, if the writer is docx, we fill the media bag and attempt to convert any SVG images to PNG, adding these to the media bag. The PNG backups have the same filenames as the SVG images, but with an added .png extension. If the conversion cannot be done (e.g. because rsvg-convert is not present), a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016’s syntax for including SVG images. If a PNG fallback is present in the media bag, we include a link to that too.
-
Powerpoint writer (Emily Bourke):
- Add support for more layouts (#5097). Up til now, four layouts were supported: “Title Slide” (used for the automatically generated metadata slide), “Section Header” (used for headings above slide level), “Two Column” (used when there’s a columns div), “Title and Content” (used for all other slides). We now support three additional layouts: “Comparison”, “Content with Caption”, and “Blank”. The manual describes the logic that determines which layout is used for a slide. Layouts may be customized in the reference doc.
- Support specifying slide background images using a
background-image
attribute on the slide’s heading. Only the “stretch” mode is supported, and the background image is centred around the slide in the image’s larger axis, matching the observed default behaviour of PowerPoint. - Add support for incremental lists (through same methods as in other slide writers) (#5689).
- Copy embedded fonts from reference doc.
- Include all themes in output archive.
- Fix list level numbering (#4828, #4663). In PowerPoint, the content of a top-level list is at the same level as the content of a top-level paragraph: the only difference is that a list style has been applied. Previously, the writer incremented the paragrap h level on each list, turning what should be top-level lists into second-level lists.
- Line up list continuation paragraphs. This commit changes the
marL
andindent
values used for plain paragraphs and numbered lists, and changes the spacing defined in the reference doc master for bulleted lists. For paragraphs, there is now a left-indent taken from theotherStyle
in the master. For numbered lists, the number is positioned where the text would be if this were a plain paragraph, and the text is indented to the next level. This means that continuation paragraphs line up nicely with numbered lists. Existing reference docs may need to be modified so thatotherStyle
andbodyStyle
indent levels match, for this feature to work with them. - Consolidate text runs when possible (jgm). This slims down the output files by avoiding unnecessary text run elements.
- Support footers in the reference doc. There is one behaviour which may not be immediately obvious: if the reference doc specifies a fixed date (i.e. not automatically updating), and there’s a date specified in the metadata for the document, the footer date is replaced by the metadata date.
- Fix presentation rel numbering. Before now, the numbering of
rId
s was inconsistent when making the presentation XML and when making the presentation relationships XML. - Don’t add relationships unnecessarily. Before now, for any layouts added to the output from the default reference doc, the relationships were unconditionally added to the output. However, if there was already a layout in slideMaster1 at the same index then that results in duplicate relationships.
- If slide level is 0, don’t insert a slide break between a heading and a following table, “columns” div, or paragraph starting with an image.
- Fix capitalisation of
notesMasterId
. - Restructure tests.
-
Asciidoc writer:
- Translate numberLines attribute to
linesnum
switch (Samuel Tardieu). - Improve escaping for
--
in URLs (#7529).
- Translate numberLines attribute to
-
LaTeX writer:
- Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move
header-includes
to afterbabel
setup so it can be modified. - Use babel, not polyglossia, with xelatex. Previously polyglossia worked better with xelatex, but that is no longer the case, so we simplify the code...
- Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move
pandoc 2.14.2
Click to expand changelog
-
Allow
--slide-level=0
(#7476). When the slide level is set to 0, headings won’t be used at all in splitting the document into slides. Horizontal rules must be used to separate slides. -
Add RTF reader (#3982).
rtf
is now supported as an input format as well as an output format. New module Text.Pandoc.Readers.RTF (exportingreadRTF
). [API change] -
HTML reader: treat comments as blank when parsing (#7482).
-
Markdown reader:
- Fix raw LaTeX injection issue (#7497). Using a code block containing
\end{verbatim}
, one could inject raw TeX into a LaTeX document even whenraw_tex
is disabled. Thanks to Augustin Laville for noticing the bug. - Multimarkdown sub- and superscripts (#5512, OCzarnecki). Added an extension
short_subsuperscripts
which modifies the behavior ofsubscript
andsuperscript
, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg.x^2 = 4
orH~2 is combustible
). This improves support for multimarkdown.
- Fix raw LaTeX injection issue (#7497). Using a code block containing
-
RST reader: Fix
:literal:
includes (#7513). These should create code blocks, not insert raw RST. -
LaTeX reader:
- Proper implicit grouping around environment macros.
- Support
\global
before\def
,\let
, etc. (#7494). - Fix scope for LaTeX macros (#7494). They should by default scope over the group in which they are defined (except
\gdef
and\xdef
, which are global). In addition, environments must be treated as groups. - Improve handling of plain TeX macro primitives (#7474). Fixed semantics for
\let
. - Implement
\edef
,\gdef
, and\xdef
.
-
Docx reader: Improve docx reader’s robustness in extracting images (#7511). The docx reader made some assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get extracted.
-
LaTeX writer: Increase table column width precision (#7466, Peter Fabinski). In some cases, the rounding performed by the LaTeX table writer would introduce visible overrun outside the text area. This adds two more decimal places to the width values.
-
Powerpoint writer:
- Include image title in description (#7352, Emily Bourke). The image title (i.e.

) was previously ignored when writing to pptx. This commit includes it in PowerPoint’s description of the image, along with the link. - Select layouts from reference doc by name (Emily Bourke). Until now, users had to make sure that their reference doc contains layouts in a specific order: the first four layouts in the file had to have a specific structure. Now the layout selection uses the layout names rather than order: users must make sure their reference doc contains four layouts with specific names, and if a layout with the right name isn’t found pandoc will emit a warning and use the corresponding layout from the default reference doc as a fallback.
- Include image title in description (#7352, Emily Bourke). The image title (i.e.
-
Docx writer: be sensitive to the
native_numbering
extension (#7499). Figure and table numbers are now only included ifnative_numbering
is enabled. (By default it is disabled.) This is a behavior change with respect to 2.14.1, but the default behavior is now that of previous versions. The change was necessary to avoid incompatibilities between pandoc’s native numbering and third-party cross reference filters like pandoc-crossref. -
RTF writer:
- Omit
\bin
in\pict
. According to the spec, this is not needed or wanted when the data is in hexadecimal format, as here. - Emit ``` for section headings.
- Omit
-
RTF template: specify font family for fixed-width font f1. According to the spec, this is mandatory.
-
LaTeX writer: Use ulem for underline (#7351). ulem is conditionally included already when the
strikeout
variable is set, so we set this when there is underlined text, and use\uline
instead of\underline
. This fixes wrapping for underlined text. -
Text.Pandoc.Citeproc:
- Revise citeproc code to fit new citeproc 0.5 API (thanks to Benjamin Bray). Linkification of URLs in the bibliography is now done in the citeproc library, depending on the setting of an option. We set that option depending on the value of the metadata field
link-bibliography
(defaulting to true, for consistency with earlier behavior). If a DOI, PMID, PMCID, or URL field is present but not explicitly rendered, the title (or if no title, the whole entry) is hyperlinked. These changes implement the recommendations from the draft CSL v1.0.2 spec (Appendix VI): https://github.com/citation-style-language/documentation/blob/master/specification.rst#appendix-vi-links - Avoid odd handling of quotes. Recent citeproc changes allow us to ignore Quoted elements; citeproc now uses its own method for represented quoted things, and only localizes and flipflops quotes it adds itself. Convert Quoted in bib entries to special Spans before passing them off to citeproc. This ensures that we get proper localization and flipflopping if, e.g., quotes are used in titles (jgm/citeproc#87).
- Removed quote localization from citeproc processing. This is now done in citeproc itself.
- Revise citeproc code to fit new citeproc 0.5 API (thanks to Benjamin Bray). Linkification of URLs in the bibliography is now done in the citeproc library, depending on the setting of an option. We set that option depending on the value of the metadata field
-
Text.Pandoc.Logging: Add PowerpointTemplateWarning log message type [API change] (Emily Bourke).
-
Text.Pandoc.Extension: Add
Ext_short_subsuperscripts
constructor toExtension
[API change] (OCzarnecki). -
Various sample.lua editorial fixes (#7493, #7487, William Lupton).
-
Bump base-compat version so we get compatibility with base 4.12.
-
Use Prelude from base-compat for ghc 8.4 too.
-
Add haskell-language-server to shell.nix (#7496, Emily Bourke).
-
Tests.Helpers: export testGolden and use it in RTF reader. This gives a diff output on failure.
-
Remove obsolete and incorrect sentence in
--slide-level
docs. -
Add internal module Text.Pandoc.Network.HTTP, exporting
urlEncode
. -
Text.Pandoc.Parsing:
parseFromString
: preserve at least the source directory (#7464). Previously we just set the source name to “chunk” when parsing from strings, to avoid misleading source positions. This had the side effect thatrebase_relative_paths
would break inside sections that were parsed as strings. So, now we use “ORIGINAL_SOURCE_PATH_chunk” instead of just “chunk”. -
Text.Pandoc.MIME: use image/x-xcf instead of application/x-xcf (#7454).
-
Don’t compare
cdLine
in OOXML golden tests (Emily Bourke). ThecdLine
field gives the line of the file some CData was found on, which reflects irrelevant formatting differences. -
Provide more detailed XML diff in tests (Emily Bourke).
-
OOXML tests: silence warnings. These can make the test output confusing, making people think tests are failing when they’re passing.
-
INSTALL.md: Add GitLab CI/CD example (#7448, Veratyr).
-
MANUAL.txt
- Clarifications (William Lupton).
- Add a note on security risks of include directives.
-
Document use of the ‘underline’ class (#7492, #7484, William Lupton).
-
Add a FAQ about the “Cannot allocate memory” error on M1 macs.
-
Use texmath 0.12.3.1.
-
Use released citeproc 0.5.
-
Remove dependency on HTTP package (#7456, mt_caret).
pandoc 2.14.1
Click to expand changelog
-
Text.Pandoc.ImageSize: Add Tiff constructor for ImageType (#7405) [Minor API change]. This allows pandoc to get size information from tiff images.
-
Markdown reader: don’t try to read contents in self-closing HTML tag. Previously we had problems parsing raw HTML with self-closing tags like
<col/>
. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed byhtmlTag
. -
LaTeX reader:
- Avoid trailing hyphen in translating languages (#7447). Previously
\foreignlanguage{english}
turned into<span lang="en-">
. The same issue affected Arabic. - Support
\cline
in LaTeX tables (#7442). - Improved parsing of raw LaTeX from Text streams (
rawLaTeXParser
, used to read LaTeX in Markdown files, #7434). We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate.
- Avoid trailing hyphen in translating languages (#7447). Previously
-
DocBook reader:
-
RST reader: fix regression with code includes (#7436). With the recent changes to include infrastructure, included code blocks were getting an extra newline.
-
HTML reader:
- Recognize data-external when reading HTML img tags (#7429, Michael Hoffmann). Preserve all attributes in img tags. If attributes have a
data-
prefix, it will be stripped. In particular, this preserves adata-external
attribute as anexternal
attribute in the pandoc AST. - Add col, colgroup to ‘closes’ definitions
- Recognize data-external when reading HTML img tags (#7429, Michael Hoffmann). Preserve all attributes in img tags. If attributes have a
-
HTML writer:
- Remove duplicated alt text in HTML output (Aner Lucero).
- Remove
aria-hidden
when explicit alt text is provided (Aner Lucero). - Set boolean values for reveal.js variables.
-
Docx writer:
- Add table numbering for captioned tables. The numbers are added using fields, so that Word can create a list of tables that will update automatically.
- Support figure numbers. These are set up in such a way that they will work with Word’s automatic table of figures (#7392).
-
Markdown writer: put space between Plain and following fenced Div (#4465).
-
EPUB writer: Don’t incorporate externally linked images in EPUB documents (#7430, Michael Hoffmann). Just as it is possible to avoid incorporating an image in EPUB by passing
data-external="1"
to a raw HTML snippet, this makes the same possible for native Images, by looking for an associatedexternal
attribute. -
Text.Pandoc.PDF:
- Fix
svgIn
path error (#7431). We were duplicating the temp directory; this didn’t cause problems on macOS or linux because there we use absolute paths for the temp directory. But on Windows it caused errors converting SVG files. convertImage
: normalize paths (#7431). This will avoid paths on Windows with mixed path separators.
- Fix
-
Text.Pandoc.Class: Always use / when adding directory to image destination with
extractMedia
, even on Windows. -
Text.Pandoc.Citeproc:
- Allow
$
characters in bibtex keys (#7409). - Set proper initial source name in parsing BibTeX (for better error messages.)
- Revamp note citation handling (#7394). Use latest citeproc, which uses a Span with a class rather than a Note for notes. This helps us distinguish between user notes and citation notes. Don’t put citations at the beginning of a note in parentheses. Fix small bug in handling of citations in notes, which led to commas at the end of sentences in some cases.
- Cleanup and efficiency improvement in
deNote
. - Improve punctuation moving with
--citeproc
. Previously, using--citeproc
could cause punctuation to move in quotes even when there aer no citations. This has been changed; punctuation moving is now limited to citations. In addition, we only move footnotes around punctuation if the style is a note style, even ifnotes-after-punctuation
istrue
.
- Allow
-
Use citeproc 0.10. This helps improve note citations (see above) and eliminates double hyperlinks in author-in-text citations. Author-only citations are no longer hyperlinked. See jgm/citeproc#77. It also fixes moving of punctuation inside quotes to conform to the CSL spec: only comma and period are moved, not question mark or exclamation point.
-
Text.Pandoc.Error: fix line calculations in reporting parsec errors. Also remove a spurious initial newline in the error report.
-
Use doctemplates 0.4.1, which gives us better support for boolean variable values. Previously
$if(foo)$
would evaluate to true for variables with booleanfalse
values, because it cared only about the string rendering (#7402). -
Require commonmark-pandoc >= 0.2.2.1. This fixes task lists with multiple paragraphs.
-
Use skylighting 0.11.
-
CSS in HTML template: reset overflow-wrap on code blocks (Mauro Bieg, #7423).
-
LaTeX template: Revert change in PR #7295: “move title, author, date up to top of preamble.” The change caused problem for people who used LaTeX commands defined defined later in the preamble in the title or author fields (#7422).
-
Add
doc/faqs.md
. This is imported from the website; in the future the website version will be drawn from here. Added a FAQ on the use of\AtEndPreamble
for cases when the contents ofheader-includes
need to refer to definitions that come later in the preamble. See #7422. -
Upgrade Debian 10 AMI for build-arm.sh.
-
CircleCI: change to using xcode 11.1.0 (macOS 10.14.4). We previously built on 10.13, but 10.13 no longer gets security updates and CirclCI is deprecating.
pandoc 2.14.0.3
Click to expand changelog
- Text.Pandoc.MediaBag
insertMediaBag
: ensure we get a sane mediaPath for URLs (#7391). In earlier 2.14.x versions, we’d get incorrect paths for resources downloaded from URLs when the media are extracted (including in PDF production). - Text.Pandoc.Parsing: improve
emailAddress
(#7398). Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. - txt2tags reader: modify the email address parser so it still includes form parameters, even after the change to
emailAddress
in Text.Pandoc.Parsing. - Text.Pandoc.Readers.Metadata: Fix regression with comment-only YAML metadata blocks (#7400).
- reveal.js writer and template: better handling of options. Previously it was impossible to specify false values for options that default to true (e.g.
center
); setting the option to false just caused the portion of the template setting the option to be omitted. Now we prepopulate all the variables with their default values, including them all unconditionally and allowing them to be overridden. - Markdown writer: Fix regression in code blocks with attributes (#7397). Code blocks with a single class but nonempty attributes were having attributes drop as a result of #7242.
- LaTeX writer:
- Add strut at end of minipage if it contains line breaks. Without them, the last line is not as tall as it should be in some cases.
- Always use a minipage for cells with line breaks, when width information is available (#7393). Otherwise the way we treat them can lead to content that overflows a cell.
- Use
\strut
instead of~
before\\
in empty line.
- Use lts-18.0 stack resolver.
- Require skylighting 0.10.5.2 (adding support for Swift).
- Require commonmark 0.2.1.
- Rephrase section on unsafe HTML in manual.
- Create SECURITY.md