Skip to content

Releases: jgm/pandoc

pandoc 2.17.1

30 Jan 20:48
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Support pagedjs-cli as pdf engine (#7838, Albert Krewinkel). PagedJS is a polyfill and supports the Paged Media standards by the W3C. https://www.pagedjs.org/

  • CommonMark reader: fix source position after YAML metadata (#7863).

  • LaTeX reader:

    • Remove retokenizing in rawLaTeXParser.

    • Ensure that \raggedright doesn’t gobble an argument (#7757).

    • Improve descItem. For some reason we were skipping arbitrary blocks before \item. This is now changed to “skip whitespace and comments.”

    • Improve handling of \newif. Adding a pair of braces around the second argument of \def prevents LaTeX from an emergency stop on input like the following (#6096).

      \newif\ifepub
      \epubtrue
      \ifepub
      hi
      \fi
      
  • Docx reader: Parse both Zotero citation and bibliography as FieldInfo (#7840).

  • LaTeX writer:

    • Allow arbitrary frameoptions to be passed to a beamer frame, using the frameoptions attribute (#7869).
    • Add s and squeeze to recognized beamer frameoptions (#7869).
  • Markdown writer: handle explicit column widths with pipe tables (#7847). If a table has explicit column width information and the content extends beyond the --columns width, we need to adjust the widths of the pipe separators to encode this width information.

  • Docx writer: Separate tables even with RawBlocks between (#7224, Michael Hoffmann). Adjacent docx tables need to be separated by an empty paragraph. If there’s a RawBlock between tables which renders to nothing, be sure to still insert the empty paragraph so that they will not collapse together.

  • Man writer: use custom font V for inline code (#7506). The V font is defined conditionally, so that it renders like CB in output formats that support that, and like B in those that don’t (e.g. the terminal). Aliases also defined for VI, VB, VBI.

  • Asciidoc writer: Support checklists in asciidoctor writer (#7832, Nikolai Korobeinikov, ricnorr). The checklist syntax (similar to task_list in markdown) seems to be an asciidoctor-only addition.

  • HTML writer:

    • Avoid duplicate “style” attributes on table cells (#7871).
    • Don’t break lines inside code elements. With the new (default) line wrapping of HTML, in conjunction with the default CSS which includes code { whitespace: pre-wrap; }, spurious line breaks could be introduced into inline code (#7858).
  • Custom writer: preserve order of element attributes (#7489, Albert Krewinkel). Attribute key-value pairs are marshaled as AttributeList, i.e., as a userdata type that behaves both like a list and a map. This allows to preserve the order of key-value pairs.

  • Switch to hslua-2.1 (Albert Krewinkel). This allows for some code simplification and improves stability.

  • Don’t read files outside of user data directory (Even Brenden). If a file path does not exist relative to the working directory, and it does exist relative to the user data directory, but outside of of the user data directory, do not read it. This applies to readDataFile and readMetadataFile in PandocMonad and, by extension, any module that uses these by passing them relative paths.

  • Text.Pandoc.Class.makeCanonical: Correctly handle consecutive “..”s at the beginning of a path (Even Brenden). Prior to this commit, ../../file would evaluate to file, when it should be unchanged.

  • Search for metadata files in $DATADIR/metadata (#7851, Even Brenden). If files specified with --metadata-file are not found in the working directory, look in $DATADIR/metadata (#5876).

  • Text.Pandoc.Class: export readMetadataFile [API change] (#5876).

  • Text.Pandoc.Error: export new PandocCouldNotFindMetadataFileError constructor for PandocError [API change] (#5876).

  • Avoid putting a frame around speaker notes in beamer (#7857). If speaker notes (a Div with class ‘notes’) occur right after a section heading, but above slide level, the resulting \note{..} caommand should not be wrapped in a frame, as that will cause a spurious blank slide.

  • CSS in HTML template: adjust #TOC and h1 on mobile (#7835, Mauro Bieg).

  • Text.Pandoc.Readers.LaTeX.Parsing: don’t export totoks. Make the first param of tokenize a SourcePos instead of SourceName, and use it instead of totoks.

  • Text.Pandoc.Shared: Modify stringify so it ignores [Citation] inside Cite (#7855). Otherwise we’ll sometimes get two copies of things, one from the citationPrefix or citationSuffix and another from the embedded fallback text. When there is no fallback text, we’ll get no content. However, it really isn’t an alternative to just rely on the result of running query on the embedded Citations; this will result in a jumble of text rather than anything structured.

  • Omit --enable-doc in the cabal haddock invocation in tools/build-and-upload-api-docs.sh.

  • Text.Pandoc.App.Opt: fix logic bug in fullDefaultsPath. Previously we would (also) search the default user data directory for a defaults file, even if a different user data directory was specified using --data-dir. This was a mistake; if --data-dir is used, the default user data directory should not be searched.

  • Text.Pandoc.Shared: defaultUserDataDir behavior change (#7842). If the XDG data directory is not defined (e.g. because it’s not supported in the OS or HOME isn’t defined), we return the empty string instead of raising an exception.

  • Update command tests to distinguish stderr and test exit status.

  • MANUAL: add that speaker notes can be used with beamer (#7856).

  • Update build-and-upload-api-docs.sh.

  • Document --trace option. Document no-check-certificate in defaults files. Document ‘sandbox’ option for defaults files. (#7873).

  • Fix pattern syntax in sample readability custom reader.

  • doc/custom-readers.lua: add example for “readable HTML.”

  • Fix message in man page about where code can be found.

  • manfilter.lua: remove extra indent in table cells with code blocks.

  • Fix lua-filters documentation for table column widths (#7864).

  • epub.doc: Update links to KindleGen (#7846, Benson Muite, Mauro Bieg). KindleGen has been deprecated and we need to link to archived versions.

  • Use tables in defaults files documentation, so each default option is paired with the corresponding command-line option (Carsten Allefeld).

  • Use skylighting 0.12.2.

  • Add pandoc-lua-marshal to Nix shell (#7849, Even Brenden).

2.17.0.1

14 Jan 20:05
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Require pandoc-lua-marshal 0.1.3.1 (#7831, Albert Krewinkel). Fixes a problem with List.includes and List.find that caused a Lua stackoverflow and subsequent program crash.

  • HTML template: load header-includes before math (#7833, Kolen Cheung). MathJax expect the config comes before loading the MathJax script. This change of order allows one to config MathJax via header-includes, which loads before the MathJax script. Cf. #2750.

  • When reading defaults file, stop at a line .... This line signals the end of a YAML document. This restores the behavior we got with HsYaml. yaml complains about content past this line. See #4627 (comment)

  • Text.Pandoc.Citeproc: allow notes-after-punctuation to work with numerical styles that use superscripts (e.g. american-medical-association.csl), as well as with note styles. The default setting of notes-after-punctuation is true for note styles and false otherwise. This restores a behavior of pandoc-citeproc that wasn’t properly carried over to Citeproc (#7826, cf. jgm/pandoc-citeproc#384).

  • Use commonmark-pandoc 0.2.1.2 (#7769).

  • Add FAQ on images in ipynb containers (#7749, Kolen Cheung).

pandoc 2.17

13 Jan 04:27
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Support markua as an output format (#1871, Tim Wisotzki and Saumel Lemmenmeier). Markua is a markdown variant used by Leanpub.

  • Add text wrapping for HTML output (#7764). Previously the HTML writer was exceptional in not being sensitive to the --wrap option. With this change --wrap now works for HTML. The default (as with other formats) is automatic wrapping. Note that the contents of script, textarea, and pre tags are always laid out with the flush combinator, so that unwanted spaces won’t be introduced if these occur in an indented context in a template.

  • Don’t read sources until in/out format are verified (#7797).

  • Issue error with --list-extensions for invalid formats (#7797).

  • Make --citeproc recognize .yml as well as .yaml extensions as YAML bibliography files (#7707, Jörn Krenzer).

  • Use latest version of KaTeX with --katex.

  • Fix parsing of footnotes in --metadata-file (#7813). Previously non-inline footnotes were not being parsed.

  • ODT reader:

    • Parse list-header as a list item (Tuong Nguyen Manh).
  • Commonmark reader:

    • Put sourcepos attribute on header, not enclosing div with -f commonmark+sourcepos (#7769).
  • Markdown reader:

    • Don’t allow ^ at beginning of link or image label (#7723). This is reserved for footnotes. Fixes regression from 0a93acf.
    • Fix parsing of “bare locators” after author-in-text citations. Previously @item [p. 12; @item2] was incorrectly parsed as three citations rather than two. This is now fixed by ensuring that prefix doesn’t gobble any semicolons.
    • Revert changes to inlinesInBalancedBrackets (commit fa83246), which caused regressions.
    • Improve detection of pipe table line widths (#7713). Fixed calculation of maximum column widths in pipe tables. It is now based on the length of the markdown line, rather than a “stringified” version of the parsed line. This should be more predictable for users. In addition, we take into account double-wide characters such as emojis.
  • Custom (Lua) readers:

    • First argument is now a list of sources instead of the concatenated text (Albert Krewinkel). The list structure can easily be converted to a string by applying tostring, but it is also possible to access the elements (each with a text and name). A small example is added to the custom reader documentation, showcasing its use in a reader that creates a syntax-highlighted code block for each source code file passed as input. Existing readers will still work through a fallback mechanism, issuing a deprecation notice.
  • Org reader:

    • Parse official org-cite citations (#7329). We also support the older org-ref style as a fallback. We no longer support the “markdown style” or “Berkeley style” citations.
    • Support alphabetical (fancy) lists (Lucas Viana). When the fancy_lists extension is enabled, alphabetical list markers are allowed, mimicking the behaviour of Org Mode when org-list-allow-alphabetical is enabled.
    • Support counter cookies in lists (Lucas Viana). Such cookies are used to override the item counter in ordered lists. In org it is possible to set the counter at any list item, but since Pandoc AST does not support this, we restrict the usage to setting an offset for the entire ordered list, by using the cookie in the first list item.
    • Allow trailing spaces after key/value pairs in directives (Albert Krewinkel). Ensures that spaces at the end of attribute directives like #+ATTR_HTML: :width 100% (note the trailing spaces) are accepted.
  • LaTeX reader:

    • Omit visible content for \label{...}. Previously we included the text of the label in square brackets, but this is undesirable in many cases. See discussion in #813 (comment).
    • Improve references (#813). Resolve references to theorem environments. Remove the Span caused by “label” in figure, table, and theorem environments; this had an id that duplicated the environments’ id.
    • Fix semantics of \ref. We were including the ams environment type in addition to the number. This is proper behavior for \cref but not for \ref. To support \cref we need to store the environment label separately.
    • Add babel mappings for Guajati (gu) and Oriya (or) (#7815).
    • Fix typo panjabi -> punjabi in babel mappings (#7814).
  • HTML reader:

    • Parse attributes on links and images (#6970).
  • Docx reader:

    • Handle multiple pic elements inside a drawing (#7786).
    • Change elemToParPart to return [ParPart] instead of ParPart. Also remove NullParPartconstructor, as it is no longer needed. This will allow us to handle elements that contain multiple ParParts, e.g. w:drawing elements with multiple pic:pic.
  • DocBook reader:

    • Collapse internal spaces in literal and other similar tags (#7821), as the standard docbook toolchain does.
    • Be sensitive to spacing=“compact” in lists (#7799). When spacing="compact" is set, Para elements are turned into Plain, so we get a “tight” list.
  • Markdown writer:

    • Add new exported function writeMarkua from Text.Pandoc.Writers.Markdown [API change] (#1871, Tim Wisotzki and Saumel Lemmenmeier).
    • Fix indentation issue in footnotes (#7801).
    • Avoid extra space before citation suffix if it already starts with a space.
    • Ensure semicolon between the locator and the next citation when an author-in-text citation has a locator and following citations.
    • Improve escaping for # (#7726).
  • Custom (Lua) writers:

    • Allow variables to be set via second return value of Doc (#6731, Albert Krewinkel). New templates variables can be added by giving variable-value pairs as a second return value of the global function Doc. Example:

      function Doc (body, meta, vars)
        vars.date = vars.date or os.date '%B %e, %Y'
        return body, vars
      end
      
    • Provide global PANDOC_WRITER_OPTIONS (#6731, Albert Krewinkel).

    • Assign default Pandoc object to global PANDOC_DOCUMENT (Albert Krewinkel). The default Pandoc object is now non-strict, i.e., only the parts of the document that are accessed will be marshaled to Lua. A special type is no longer necessary. This change also makes it possible to use the global variable with library functions such as pandoc.utils.references, or to inspect the document contents with walk().

  • LaTeX writer:

    • Fix typo panjabi -> punjabi in babel mappings (#7814).
  • MediaWiki writer:

    • Remove redundant display text for wiki links (Jesse Hathaway).
  • Docx writer:

    • Handle bullets correctly in lists by not reusing numIds (#7689, Michael Hoffmann). This fixes a bug in which a Div in a list item would receive bullets on its contained paragraphs.
  • Org writer:

    • Fix list items starting with a code block or other non-paragraph content (#7810).
    • Avoid blank lines after tight sublists (#7810).
    • Fix extra blank line inserted after empty list item (#7810).
    • Don’t add blank line before lists (#7810).
    • Support starting number cookies (Lucas Viana). This is necessary for lists that start at a number other than 1.
    • Support the new org-cite syntax (#7329).
  • Haddock writer:

    • Avoid blank lines after tight sublists (#7810).
  • Ipynb writer:

    • Ensure deterministic order of keys.
    • Handle cell output with raw block of markdown (#7563, Kolen Cheung). Write RawBlock of markdown in code-cell output. This is designed to fit the behavior of #7561, which makes the ipynb reader parse code-cell output with mime “text/markdown” to a RawBlock of markdown. This commit makes the ipynb writer writes this RawBlock of markdown back inside a code-cell output with the same mime, preserving this information in round-trip.
    • In choosing between multiple output options, always favor those marked with the output format over images (Kolen Cheung). Previously, both fmt == f case and Image have a rank of 1.
  • Ipynb reader & writer: properly handle cell “id” (#7728). This is passed through if it exists (in Nb4); otherwise the writer will add a random one so that all cells have an “id”.

  • Ms writer:

    • Properly encode strings for PDF contents (#7731).
  • JATS writer:

    • Keep quotes in element-citations (Albert Krewinkel). Fixed a bug that lead to quote characters being lost in element-citations.
  • RTF writer:

    • Properly handle images in data URIs (#7771).
  • Commonmark writer:

    • Allow ‘)’ delimiters on ordered lists.
  • RST writer:

    • Avoid extra blank line after empty list item (#7810).
  • HTML writer:

    • Make line breaks more consistent. With --wrap=none, we now output line breaks between block-level elements. Previously they were omitted entirely, so the whole document was on one line, unless there were literal line breaks in pre sections. This makes the HTML writer’s behavior more consistent with that of other writers. Also, regardless of wrap settings, put newline after <dd> and after block-level elements in the footnotes section. And add a line break between an img tag and the associated figcaption.
    • reveal.js: Make sure images with r-stretch are not in p tags. They must be direct children of the section. There was previously code to make this work with the older class name stretch, but the name has changed in reveal.js.
    • reveal.js: don’t add r-fit-text class to section. It must go on the header only.
  • AsciiDoc writer:

    • Improve detection of intraword emphasis (#7803).
  • OpenDocument writer:

    • Fix vertical alignment bug with...
Read more

pandoc 2.16.2

22 Nov 01:22
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Add interface for custom readers written in Lua (#7669). Users can now do -f myreader.lua and pandoc will treat the script myreader.lua as a custom reader, which parses an input string to a pandoc AST, using the pandoc module defined for Lua filters. A sample custom reader can be found in data/creole.lua. Also see documentation in doc/custom-readers.md.

  • New module Text.Pandoc.Readers.Custom, exporting readCustom [API change].

  • Allow plain to be used in raw attribute syntax.

  • Accept empty --metadata-file (#7675). This was a regression from 2.15 behavior.

  • Markdown reader: Improve inlinesInBalancedBrackets. This is just a small improvement in terms of performance, but it’s simpler and more direct code. Also, we avoid parsing interparagraph spaces in balanced brackets, as the original did.

  • BibTeX reader: Properly handle commented lines in BibTeX/BibLaTeX (#7668).

  • RST reader: handle class attribute for for custom roles (#7699, willj-dev). Previously the class attribute was ignored, and the name of the role used as the class.

  • DocBook reader:

    • Add <titleabbr> support (Rowan Rodrik van der Molen).
    • Support for <indexterm> (#7607, Rowan Rodrik van der Molen).
  • LaTeX reader:

    • Add rudimentary support for \autoref (#7693).
    • Add ‘uri’ class when parsing \url, for consistency with treatment of autolinks in other formats (#7672).
  • JATS reader: Capture alt-text in figures (#7703, Aner Lucero).

  • MediaWiki writer: use HTML spans for anchors when header has id (#7697). We need to generate a span when the header’s ID doesn’t match the one MediaWiki would generate automatically. Note that MediaWiki’s generation scheme is different from pandoc’s (it uses uppercase letters, and _ instead of -, for example). This means that in going from markdown to mediawiki, we’ll now get spans before almost every heading, unless explicit identifiers are used that correspond to the ones MediaWiki auto-generates. This is uglier output but it’s necessary for internal links to work properly.

  • Markdown writer: don’t create autolinks when this loses information (#7692). Previously we sometimes lost attributes when rendering links as autolinks.

  • Text.Pandoc.Readers.Metadata: allow multiple YAML documents when parsing YAML for yamlBsToRefs. Some people use --- as the end delimiter in YAML bibliography files, which causes the yaml library to emit an error unless we explicitly allow multiple YAML documents (and just consider the first).

  • JATS writer:

    • Ensure figures are wrapped with <p> in list items (Albert Krewinkel). This prevents the generation of invalid output.
    • Add URL to element citation entries (Albert Krewinkel). The URL of a reference, if present, is added in tag <uri> to element-citation entries.
  • HTML writer: Don’t create invalid data- attribute for empty attribute key (#7546).

  • LaTeX writer:

    • Babel mappings: use ancientgreek for grc.
    • With -t latex-smart, don’t generate \ldots from ellipsis (#7674). Instead just use unicode ellipsis.
  • JATS template: fix equal-contrib attribute (Albert Krewinkel). The standard requires the value to be either yes or no, but is was set to true for authors who contributed equally.

  • reveal.js template: Add disableLayout variable (Christophe Dervieux).

  • Text.Pandoc.Error: sort errors in handleError by exit code (Albert Krewinkel).

  • Text.Pandoc.Writers.Shared: Improve toLegacyTable (#7683, Christian Despres).

  • Lua subsystem:

    • Include lpeg module (#7649, Albert Krewinkel). Compiles the lpeg library (Parsing Expression Grammars For Lua) into the program. Package maintainers may choose to rely on package dependencies to make lpeg available, in which case they can compile the with the constraint lpeg +rely-on-shared-lpeg-library. lpeg and re are always made available in global variables, without the need for a require.

    • Set lpeg and re as globals; allow shared lib access via require. The lpeg and re modules are loaded into globals of the respective name, but they are not necessarily registered as loaded packages. This ensures that

      • the built-in library versions are preferred when setting the globals,
      • a shared library is used if pandoc has been compiled without lpeg, and
      • the require mechanism can be used to load the shared library if available, falling back to the internal version if possible and necessary.
    • Fix argument order in constructor pandoc.Cite (Albert Krewinkel). This restores the old behavior; argument order had been switched accidentally in pandoc 2.15.

    • Add Pushable instance for ReaderOptions (Albert Krewinkel).

    • Allow to pass custom reader options to pandoc.read as an optional third argument (#7656, Albert Krewinkel). The object can either be a table or a ReaderOptions value like PANDOC_READER_OPTIONS. Creating new ReaderOptions objects is possible through the new constructor pandoc.ReaderOptions.

    • Display Pandoc values using their native Haskell representation (Albert Krewinkel).

    • Require latest hslua (2.0.1) (#7661, #7657, Albert Krewinkel). This fixes issues with

      • misleading error messages when a required function parameter is omitted;
      • absent properties still being listed in the output of pairs; and
      • alias accessing leading to errors instead of returning nil, e.g. with (pandoc.Str '').identifier.
    • Add missing space in “package not found” message (#7658, Albert Krewinkel).

  • Update build files (#7696, Fabián Heredia Montiel). Drop old windows 32-bit constraints. Update cabal tested-with field to correspond to ci.yml matrix

  • Remove unneeded package dependencies from benchmark target.

  • Require ghc >= 8.6, base >= 4.12. This allows us to get rid of the old custom prelude and some crufty cpp. But the primary reason for this is that conduit has bumped its base lower bound to 4.12, making it impossible for us to support lower base versions.

  • Require Cabal 2.4. Use wildcards to ensure that all pptx tests are included (#7677).

  • Update bash_completion.tpl (S.P.H.).

  • Add data/creole.lua as sample custom reader.

  • Add doc/custom-readers.md and doc/custom-writers.md.

  • doc/lua-filters.md: add section on global modules, including lpeg (Albert Krewinkel).

  • MANUAL.txt: update table of exit codes and corresponding errors (Albert Krewinkel).

  • Use latest texmath.

pandoc 2.16.1

03 Nov 07:22
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Docx reader: don’t let first line indents trigger block quotes (#7655). This fixes a regression introduced in pandoc 2.15.

  • Docx writer: use getTimestamp for modification times in reference.docx (#7654). This ensures that when SOURCE_DATE_EPOCH is set, the modification times of files taken from the reference.docx will be set deterministically, allowing for reproducible builds.

  • Lua subsystem (Albert Krewinkel):

    • Load module pandoc.path on startup (#7524). Previously the module always had to be loaded via require 'pandoc.path'.
    • Fix typo in SoftBreak constructor.
    • Re-add content property to Strikeout elements. Fixes a regression introduced in 2.15.
    • Be more forgiving when retrieving the Image caption property. Fixes a regression introduced in 2.15.
    • Display Attr values using their native Haskell representation.
    • Allow omitting the 2nd parameter in pandoc.Code constructor. Fixes a regression introduced in 2.15 which required users to always specify an Attr value when constructing a Code element.
    • Allow to compare, show Citation values. Comparisons of Citation values are performed in Haskell; values are equal if they represent the same Haskell value. Converting a Citation value to a string now yields its native Haskell string representation.
    • Restore List behavior of MetaList (#7650). Fixes a regression introduced in 2.16 which had MetaList elements lose the pandoc.List properties.
    • Restore content property on Header elements.
    • Ensure Block elements have all expected properties.
    • Ensure Inline elements have all expected properties.
  • Allow tasty-bench 0.3.x.

pandoc 2.16

31 Oct 20:50
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Switch back from HsYAML to yaml for parsing YAML metadata (#6084). HsYAML is around 20 times slower in parsing large YAML bibliographies. In addition, HsYAML is not being actively maintained. This sets us back in our attempts to free ourselves from C dependencies (#4535). But I don’t see a good alternative until a faster pure Haskell parser is available. Notes:

    • We’ve removed the FromYAML instances for all types that had them, since this is a HsYAML-specific typeclass [API change]. (The yaml package just uses From/ToJSON instead of having a dedicated From/ToYAML class.)
    • Unlike HsYAML (in the configuration we were using), yaml parses ‘Y’, ‘N’, ‘Yes’, ‘No’, ‘On’, ‘Off’ as boolean values. Users may need to quote these when they are meant to be interpreted as strings. Similarly, ‘null’ is parsed as a YAML null value (and will be treated as an empty string by pandoc rather than the string ‘null’). Quoting it will force it to be interpreted as a string.
    • Some tests had to be adjusted accordingly.
    • Pandoc now behaves in a more useful way when the YAML metadata contains escaping errors: instead of just failng silently and falling back to some other interpretation of the section, it raises a YAML parsing error.
  • Markdown writer: Ensure that special values are quoted in YAML metadata. These include “Y”, “yes”, “on”, and “off”, which are now (with yaml library) considered boolean values, as well as “null”.

  • Change JSON encodings of some types.

    • For LineEnding use lowercase constructors, e.g. crlf, native.
    • For HTMLSlideVariant use lowercase constructors.
    • For ReaderOptions use e.g. default-image-extension instead of readerDefaultImageExtension for field names.
    • For Extension, use e.g. tex_math_dollars instead of Ext_tex_math_dollars as constructor.
    • For Extensions, use an array of Extensions, instead of an object wrapping the tag Extensions and an integer. (The integer representation is not supposed to be part of the public API.)
    • For Opt, use field names like tab-stop instead of optTabStop.
  • Docx writer:

    • Add IDs to native_numbering test (Tristan Stenner).
    • Move “:” out of the caption bookmark (Tristan Stenner). This is needed so that native references to the figure are included as “As seen in Figure X, it is…” instead of “As seen in [Figure: X, it is…”
  • Lua (Albert Krewinkel, except as noted):

    • Use hslua module abstraction where possible.

    • Fix placement of tests for Block elements in pandoc module tests

    • Increase strictness when getting attribute keys

    • Re-add t and tag property to Attr values. Removal of these properties from Attr values was a regression.

    • Fix pandoc.utils.stringify regression. The pandoc.utils.stringify function returned empty strings when called with a string argument.

    • Fix a copy/paste bug in Lua marshalling code (John MacFarlane, #7639). This caused links to be changed to figures when Lua filters changed link properties.

    • Re-add content property to Link elements (#7647). This was a regression introduced in version 2.15.

    • Generate constants in module pandoc programmatically.

    • Marshal SimpleTable, ListAttributes, Citation, and Block values as userdata objects. Properties of Block values are marshalled lazily, which generally improves performance considerably. Script users may also notice the following differences:

      • Block element properties can no longer be accessed by numerical indexing of the .c field. The .c property now serves as an alias for .content, so some filter that used this undocumented method for property access may continue to work, while others will need to be updated and use proper property names.
      • The marshalled Block elements now have a show method, and a __tostring metamethod. Both return the Haskell string representation of the element.
      • Block values now have the Lua type userdata instead of table.
  • Add a short guide to pandoc’s sources (Albert Krewinkel).

  • Fix epub files in epub reader tests, so that they are valid according to epubcheck (#7586).

  • Allow time 1.13.

  • Require latest skylighting (0.12.1).

  • Fix build on GHC 9.2 (Joseph C. Sible).

  • Fix trypandoc so it builds with aeson > 2.

pandoc 2.15

24 Oct 02:18
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Add --sandbox option (#5045).

    • Add sandbox feature. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system.
    • Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App.
    • Note that when --sandboxed is specified, readers won’t have access to the resource path, nor will anything have access to the user data directory.
  • --self-contained: Fix bug that caused everything to be made a data URI (#7635, #7367). We only need to use data URIs in certain cases, but due to a bug they were being used always.

  • Pandoc will now fall back to latin1 encoding for inputs that can’t be read as UTF-8. This is what it did previously for content fetched from the web and not marked as to content type. It makes sense to do the same for local files. In this case a NotUTF8Encoded warning will be issued, indicating that pandoc is interpreting the input as latin1.

  • Markdown reader:

    • Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse [link to (@a)](url) as a citation; similarly [(@a)]{#ident}. This is undesirable. One should be able to use example references in citations, and even if @a is not defined as an example reference, [@a](url) should be a link containing an author-in-text citation rather than a normal citation followed by literal (url).
    • Fix interaction of --strip-comments and list parsing (#7521). Use of --strip-comments was causing tight lists to be rendered as loose (as if the comment were a blank line).
    • Fix parsing bug for math in bracketed spans and links (#7623). This affects math with unbalanced brackets (e.g. $(0,1]$) inside links, images, bracketed spans.
    • Fix code blocks using --preserve-tabs (#7573). Previously they did not behave as the equivalent input with spaces would.
  • DocBook reader:

    • Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook linenumbering="numbered" on code blocks maps to the numberLines class internally.
  • LaTeX reader:

    • Implement siunitx v3 commands (#7614). We support \unit, \qty, \qtyrange, and \qtylist as synonynms of \si, \SI, \SIrange, and \SIlist.
    • Properly handle \^ followed by group closing (#7615).
    • Recognize that \vadjust sometimes takes “pre” (#7531).
    • Ignore (and gobble parameters of) CSLReferences environment (#7531). Otherwise we get the parameters as numbers in the output.
    • Restrict \endinput to current file (Simun Schuster).
  • RST reader: handle escaped colons in reference definitions (#7568).

  • HTML reader:

    • Handle empty tbody element in table (#7589).
  • Ipynb reader (Kolen Cheung):

    • Get cell output mime from raw_mimetype in addition to format. (format is what the spec calls for, but raw_mimetype is often used in practice; see jupyter/nbformat#229).
    • Add more formats that can be handled as “raw” cells.
    • Fix mime type for rst.
    • Support text/markdown, which is now a supported mime type for raw output (#7561).
  • RTF reader:

    • Support \binN for binary image data.
    • If doc begins with { … } only parse its contents. Some documents seem to have non-RTF (e.g. XML) material after the {\rtf1 ... } group.
    • Ignore \pgdsc group. Otherwise we get style names treated as test.
    • Better handling of \* and bookmarks. We now ensure that groups starting with \* never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
  • Docx reader:

    • Avoid blockquote when parent style has more indent (Milan Bracke). When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
    • Fix handling of empty fields (Milan Bracke). Some fields only have an instrText and no content, Pandoc didn’t understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn’t.
    • Implement PAGEREF fields (Milan Bracke). These fields, often used in tables of contents, can be a hyperlink.
    • Fix handling of nested fields (Milan Bracke). Fields delimited by fldChar elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field.
    • Add placeholder for word diagram instead of just omitting it (Ezwal).
  • Org reader:

    • Don’t parse a list as first item in a list item (#7557).
    • Allow an initial :PROPERTIES: drawer to add to metadata (#7520).
  • Docx writer:

    • Make id used in native_numbering predictable (#7551). If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. This allows one to create a filter that adds a figure number with figure name, e.g. <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t> </w:r></w:fldSimple>. If an image lack an id, an id of the form ref_fig1 is used.
  • Ensure we have unique ids for wp:docPr and pic:cNvPr elements (#7527, #7503).

  • Handle SVG images (#4058). This change has several parts:

    • In Text.Pandoc.App, if the writer is docx, we fill the media bag and attempt to convert any SVG images to PNG, adding these to the media bag. The PNG backups have the same filenames as the SVG images, but with an added .png extension. If the conversion cannot be done (e.g. because rsvg-convert is not present), a warning is omitted.
    • In Text.Pandoc.Writers.Docx, we now use Word 2016’s syntax for including SVG images. If a PNG fallback is present in the media bag, we include a link to that too.
  • Powerpoint writer (Emily Bourke):

    • Add support for more layouts (#5097). Up til now, four layouts were supported: “Title Slide” (used for the automatically generated metadata slide), “Section Header” (used for headings above slide level), “Two Column” (used when there’s a columns div), “Title and Content” (used for all other slides). We now support three additional layouts: “Comparison”, “Content with Caption”, and “Blank”. The manual describes the logic that determines which layout is used for a slide. Layouts may be customized in the reference doc.
    • Support specifying slide background images using a background-image attribute on the slide’s heading. Only the “stretch” mode is supported, and the background image is centred around the slide in the image’s larger axis, matching the observed default behaviour of PowerPoint.
    • Add support for incremental lists (through same methods as in other slide writers) (#5689).
    • Copy embedded fonts from reference doc.
    • Include all themes in output archive.
    • Fix list level numbering (#4828, #4663). In PowerPoint, the content of a top-level list is at the same level as the content of a top-level paragraph: the only difference is that a list style has been applied. Previously, the writer incremented the paragrap h level on each list, turning what should be top-level lists into second-level lists.
    • Line up list continuation paragraphs. This commit changes the marL and indent values used for plain paragraphs and numbered lists, and changes the spacing defined in the reference doc master for bulleted lists. For paragraphs, there is now a left-indent taken from the otherStyle in the master. For numbered lists, the number is positioned where the text would be if this were a plain paragraph, and the text is indented to the next level. This means that continuation paragraphs line up nicely with numbered lists. Existing reference docs may need to be modified so that otherStyle and bodyStyle indent levels match, for this feature to work with them.
    • Consolidate text runs when possible (jgm). This slims down the output files by avoiding unnecessary text run elements.
    • Support footers in the reference doc. There is one behaviour which may not be immediately obvious: if the reference doc specifies a fixed date (i.e. not automatically updating), and there’s a date specified in the metadata for the document, the footer date is replaced by the metadata date.
    • Fix presentation rel numbering. Before now, the numbering of rIds was inconsistent when making the presentation XML and when making the presentation relationships XML.
    • Don’t add relationships unnecessarily. Before now, for any layouts added to the output from the default reference doc, the relationships were unconditionally added to the output. However, if there was already a layout in slideMaster1 at the same index then that results in duplicate relationships.
    • If slide level is 0, don’t insert a slide break between a heading and a following table, “columns” div, or paragraph starting with an image.
    • Fix capitalisation of notesMasterId.
    • Restructure tests.
  • Asciidoc writer:

    • Translate numberLines attribute to linesnum switch (Samuel Tardieu).
    • Improve escaping for -- in URLs (#7529).
  • LaTeX writer:

    • Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move header-includes to after babel setup so it can be modified.
    • Use babel, not polyglossia, with xelatex. Previously polyglossia worked better with xelatex, but that is no longer the case, so we simplify the code...
Read more

pandoc 2.14.2

21 Aug 16:55
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Allow --slide-level=0 (#7476). When the slide level is set to 0, headings won’t be used at all in splitting the document into slides. Horizontal rules must be used to separate slides.

  • Add RTF reader (#3982). rtf is now supported as an input format as well as an output format. New module Text.Pandoc.Readers.RTF (exporting readRTF). [API change]

  • HTML reader: treat comments as blank when parsing (#7482).

  • Markdown reader:

    • Fix raw LaTeX injection issue (#7497). Using a code block containing \end{verbatim}, one could inject raw TeX into a LaTeX document even when raw_tex is disabled. Thanks to Augustin Laville for noticing the bug.
    • Multimarkdown sub- and superscripts (#5512, OCzarnecki). Added an extension short_subsuperscripts which modifies the behavior of subscript and superscript, allowing subscripts or superscripts containing only alphanumerics to end with a space character (eg. x^2 = 4 or H~2 is combustible). This improves support for multimarkdown.
  • RST reader: Fix :literal: includes (#7513). These should create code blocks, not insert raw RST.

  • LaTeX reader:

    • Proper implicit grouping around environment macros.
    • Support \global before \def, \let, etc. (#7494).
    • Fix scope for LaTeX macros (#7494). They should by default scope over the group in which they are defined (except \gdef and \xdef, which are global). In addition, environments must be treated as groups.
    • Improve handling of plain TeX macro primitives (#7474). Fixed semantics for \let.
    • Implement \edef, \gdef, and \xdef.
  • Docx reader: Improve docx reader’s robustness in extracting images (#7511). The docx reader made some assumptions about how docx containers were laid out that were not always true, with the result that some images in documents did not get extracted.

  • LaTeX writer: Increase table column width precision (#7466, Peter Fabinski). In some cases, the rounding performed by the LaTeX table writer would introduce visible overrun outside the text area. This adds two more decimal places to the width values.

  • Powerpoint writer:

    • Include image title in description (#7352, Emily Bourke). The image title (i.e. ![alt text](link "title")) was previously ignored when writing to pptx. This commit includes it in PowerPoint’s description of the image, along with the link.
    • Select layouts from reference doc by name (Emily Bourke). Until now, users had to make sure that their reference doc contains layouts in a specific order: the first four layouts in the file had to have a specific structure. Now the layout selection uses the layout names rather than order: users must make sure their reference doc contains four layouts with specific names, and if a layout with the right name isn’t found pandoc will emit a warning and use the corresponding layout from the default reference doc as a fallback.
  • Docx writer: be sensitive to the native_numbering extension (#7499). Figure and table numbers are now only included if native_numbering is enabled. (By default it is disabled.) This is a behavior change with respect to 2.14.1, but the default behavior is now that of previous versions. The change was necessary to avoid incompatibilities between pandoc’s native numbering and third-party cross reference filters like pandoc-crossref.

  • RTF writer:

    • Omit \bin in \pict. According to the spec, this is not needed or wanted when the data is in hexadecimal format, as here.
    • Emit ``` for section headings.
  • RTF template: specify font family for fixed-width font f1. According to the spec, this is mandatory.

  • LaTeX writer: Use ulem for underline (#7351). ulem is conditionally included already when the strikeout variable is set, so we set this when there is underlined text, and use \uline instead of \underline. This fixes wrapping for underlined text.

  • Text.Pandoc.Citeproc:

    • Revise citeproc code to fit new citeproc 0.5 API (thanks to Benjamin Bray). Linkification of URLs in the bibliography is now done in the citeproc library, depending on the setting of an option. We set that option depending on the value of the metadata field link-bibliography (defaulting to true, for consistency with earlier behavior). If a DOI, PMID, PMCID, or URL field is present but not explicitly rendered, the title (or if no title, the whole entry) is hyperlinked. These changes implement the recommendations from the draft CSL v1.0.2 spec (Appendix VI): https://github.com/citation-style-language/documentation/blob/master/specification.rst#appendix-vi-links
    • Avoid odd handling of quotes. Recent citeproc changes allow us to ignore Quoted elements; citeproc now uses its own method for represented quoted things, and only localizes and flipflops quotes it adds itself. Convert Quoted in bib entries to special Spans before passing them off to citeproc. This ensures that we get proper localization and flipflopping if, e.g., quotes are used in titles (jgm/citeproc#87).
    • Removed quote localization from citeproc processing. This is now done in citeproc itself.
  • Text.Pandoc.Logging: Add PowerpointTemplateWarning log message type [API change] (Emily Bourke).

  • Text.Pandoc.Extension: Add Ext_short_subsuperscripts constructor to Extension [API change] (OCzarnecki).

  • Various sample.lua editorial fixes (#7493, #7487, William Lupton).

  • Bump base-compat version so we get compatibility with base 4.12.

  • Use Prelude from base-compat for ghc 8.4 too.

  • Add haskell-language-server to shell.nix (#7496, Emily Bourke).

  • Tests.Helpers: export testGolden and use it in RTF reader. This gives a diff output on failure.

  • Remove obsolete and incorrect sentence in --slide-level docs.

  • Add internal module Text.Pandoc.Network.HTTP, exporting urlEncode.

  • Text.Pandoc.Parsing: parseFromString: preserve at least the source directory (#7464). Previously we just set the source name to “chunk” when parsing from strings, to avoid misleading source positions. This had the side effect that rebase_relative_paths would break inside sections that were parsed as strings. So, now we use “ORIGINAL_SOURCE_PATH_chunk” instead of just “chunk”.

  • Text.Pandoc.MIME: use image/x-xcf instead of application/x-xcf (#7454).

  • Don’t compare cdLine in OOXML golden tests (Emily Bourke). The cdLine field gives the line of the file some CData was found on, which reflects irrelevant formatting differences.

  • Provide more detailed XML diff in tests (Emily Bourke).

  • OOXML tests: silence warnings. These can make the test output confusing, making people think tests are failing when they’re passing.

  • INSTALL.md: Add GitLab CI/CD example (#7448, Veratyr).

  • MANUAL.txt

    • Clarifications (William Lupton).
    • Add a note on security risks of include directives.
  • Document use of the ‘underline’ class (#7492, #7484, William Lupton).

  • Add a FAQ about the “Cannot allocate memory” error on M1 macs.

  • Use texmath 0.12.3.1.

  • Use released citeproc 0.5.

  • Remove dependency on HTTP package (#7456, mt_caret).

pandoc 2.14.1

19 Jul 05:42
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Text.Pandoc.ImageSize: Add Tiff constructor for ImageType (#7405) [Minor API change]. This allows pandoc to get size information from tiff images.

  • Markdown reader: don’t try to read contents in self-closing HTML tag. Previously we had problems parsing raw HTML with self-closing tags like <col/>. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by htmlTag.

  • LaTeX reader:

    • Avoid trailing hyphen in translating languages (#7447). Previously \foreignlanguage{english} turned into <span lang="en-">. The same issue affected Arabic.
    • Support \cline in LaTeX tables (#7442).
    • Improved parsing of raw LaTeX from Text streams (rawLaTeXParser, used to read LaTeX in Markdown files, #7434). We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate.
  • DocBook reader:

    • Handle images with imageobjectco elements (#7440).
    • Add support for citerefentry (#7437, Jan Tojnar).
  • RST reader: fix regression with code includes (#7436). With the recent changes to include infrastructure, included code blocks were getting an extra newline.

  • HTML reader:

    • Recognize data-external when reading HTML img tags (#7429, Michael Hoffmann). Preserve all attributes in img tags. If attributes have a data- prefix, it will be stripped. In particular, this preserves a data-external attribute as an external attribute in the pandoc AST.
    • Add col, colgroup to ‘closes’ definitions
  • HTML writer:

    • Remove duplicated alt text in HTML output (Aner Lucero).
    • Remove aria-hidden when explicit alt text is provided (Aner Lucero).
    • Set boolean values for reveal.js variables.
  • Docx writer:

    • Add table numbering for captioned tables. The numbers are added using fields, so that Word can create a list of tables that will update automatically.
    • Support figure numbers. These are set up in such a way that they will work with Word’s automatic table of figures (#7392).
  • Markdown writer: put space between Plain and following fenced Div (#4465).

  • EPUB writer: Don’t incorporate externally linked images in EPUB documents (#7430, Michael Hoffmann). Just as it is possible to avoid incorporating an image in EPUB by passing data-external="1" to a raw HTML snippet, this makes the same possible for native Images, by looking for an associated external attribute.

  • Text.Pandoc.PDF:

    • Fix svgIn path error (#7431). We were duplicating the temp directory; this didn’t cause problems on macOS or linux because there we use absolute paths for the temp directory. But on Windows it caused errors converting SVG files.
    • convertImage: normalize paths (#7431). This will avoid paths on Windows with mixed path separators.
  • Text.Pandoc.Class: Always use / when adding directory to image destination with extractMedia, even on Windows.

  • Text.Pandoc.Citeproc:

    • Allow $ characters in bibtex keys (#7409).
    • Set proper initial source name in parsing BibTeX (for better error messages.)
    • Revamp note citation handling (#7394). Use latest citeproc, which uses a Span with a class rather than a Note for notes. This helps us distinguish between user notes and citation notes. Don’t put citations at the beginning of a note in parentheses. Fix small bug in handling of citations in notes, which led to commas at the end of sentences in some cases.
    • Cleanup and efficiency improvement in deNote.
    • Improve punctuation moving with --citeproc. Previously, using --citeproc could cause punctuation to move in quotes even when there aer no citations. This has been changed; punctuation moving is now limited to citations. In addition, we only move footnotes around punctuation if the style is a note style, even if notes-after-punctuation is true.
  • Use citeproc 0.10. This helps improve note citations (see above) and eliminates double hyperlinks in author-in-text citations. Author-only citations are no longer hyperlinked. See jgm/citeproc#77. It also fixes moving of punctuation inside quotes to conform to the CSL spec: only comma and period are moved, not question mark or exclamation point.

  • Text.Pandoc.Error: fix line calculations in reporting parsec errors. Also remove a spurious initial newline in the error report.

  • Use doctemplates 0.4.1, which gives us better support for boolean variable values. Previously $if(foo)$ would evaluate to true for variables with boolean false values, because it cared only about the string rendering (#7402).

  • Require commonmark-pandoc >= 0.2.2.1. This fixes task lists with multiple paragraphs.

  • Use skylighting 0.11.

  • CSS in HTML template: reset overflow-wrap on code blocks (Mauro Bieg, #7423).

  • LaTeX template: Revert change in PR #7295: “move title, author, date up to top of preamble.” The change caused problem for people who used LaTeX commands defined defined later in the preamble in the title or author fields (#7422).

  • Add doc/faqs.md. This is imported from the website; in the future the website version will be drawn from here. Added a FAQ on the use of \AtEndPreamble for cases when the contents of header-includes need to refer to definitions that come later in the preamble. See #7422.

  • Upgrade Debian 10 AMI for build-arm.sh.

  • CircleCI: change to using xcode 11.1.0 (macOS 10.14.4). We previously built on 10.13, but 10.13 no longer gets security updates and CirclCI is deprecating.

pandoc 2.14.0.3

22 Jun 22:01
@jgm jgm
Compare
Choose a tag to compare
Click to expand changelog
  • Text.Pandoc.MediaBag insertMediaBag: ensure we get a sane mediaPath for URLs (#7391). In earlier 2.14.x versions, we’d get incorrect paths for resources downloaded from URLs when the media are extracted (including in PDF production).
  • Text.Pandoc.Parsing: improve emailAddress (#7398). Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text.
  • txt2tags reader: modify the email address parser so it still includes form parameters, even after the change to emailAddress in Text.Pandoc.Parsing.
  • Text.Pandoc.Readers.Metadata: Fix regression with comment-only YAML metadata blocks (#7400).
  • reveal.js writer and template: better handling of options. Previously it was impossible to specify false values for options that default to true (e.g. center); setting the option to false just caused the portion of the template setting the option to be omitted. Now we prepopulate all the variables with their default values, including them all unconditionally and allowing them to be overridden.
  • Markdown writer: Fix regression in code blocks with attributes (#7397). Code blocks with a single class but nonempty attributes were having attributes drop as a result of #7242.
  • LaTeX writer:
    • Add strut at end of minipage if it contains line breaks. Without them, the last line is not as tall as it should be in some cases.
    • Always use a minipage for cells with line breaks, when width information is available (#7393). Otherwise the way we treat them can lead to content that overflows a cell.
    • Use \strut instead of ~ before \\ in empty line.
  • Use lts-18.0 stack resolver.
  • Require skylighting 0.10.5.2 (adding support for Swift).
  • Require commonmark 0.2.1.
  • Rephrase section on unsafe HTML in manual.
  • Create SECURITY.md