Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining mdast for citations #21

Open
rowanc1 opened this issue Mar 31, 2022 · 5 comments
Open

Defining mdast for citations #21

rowanc1 opened this issue Mar 31, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@rowanc1
Copy link
Member

rowanc1 commented Mar 31, 2022

Currently doing some investigation on citations and thought I would post it here as it would be great to get on the same page for the data-structures for citations in mdast (I think there is more thought probably on the myst-syntax, do we adopt [@key] pandoc style citations, etc.). I would love to be aiming for the same place for the mdast data structures as the other syntax conversations evolve.

For a piece of technical content, the best practices for in-text citations are probably latex/natbib and pandoc citations which are defined here:

I am think the following mdast data-structures might capture everything:

type CiteGroup = {
  type: 'citeGroup'
  kind: 'narrative' | 'parenthetical'; // 'citet' vs 'citep'
  children: Cite[]
}

type Cite = {
  type: 'cite'
  identifier: string
  label: string
  expand: boolean // this is the * in natbib, expands authors, false by default
  partial: 'author' | 'year'
  prefix: string // e.g. "see" or "e.g."
  suffix: string // e.g. "99 years later" or something
  locator: string // e.g. "chap. 2", joined with a comma -- defined by CSL locale (pp. fig. etc.)
  // alias: string // use "Paper 1", maybe do this later?
}

I think this works pretty well and can fit with the {cite:t}`jon22` syntax we already have defined, but maybe in the future there is some way to give roles more data:
For example: {cite:p}[prefix="see", locator="chap. 2"]`jon22`
would yield: (see Jones et al., 2022, chap. 2)
Or maybe there is a specialized way to do this with [see @jon22, chap. 2] (see pandoc)

For multiple citations, the citeGroup would never be a directive or be in the markup, (i.e. [@key1; @key2] or {cite:p}`key1; key2`), but I think that the AST data structure is better represented by multiple nodes, one holding the group (parenthetical) information, this also means UIs can open groups of citations in a list (e.g. see distill/elife as good examples of this UI).

Both cite and citeGroup would be flow content, so the equivalent of a "citet" in latex is just a cite node in a paragraph (@key1 in pandoc style).

Some questions:

  • what is the best name for citeGroup?
  • should we follow kind or have some different flags like parenthetical? I suggested kind because that seemed easier to expand in the future if we add num or alt etc. (previously suggested a single cite node, splitting into group solves this).
  • narrative and parenthetical nomenclature comes from here

Existing implementations:

Would be curious on your thoughts @chrisjsewell and @fwkoch (maybe @mmcky as well?)!

@chrisjsewell
Copy link
Contributor

Would be curious on your thoughts @chrisjsewell

See executablebooks/MyST-Parser#511 😉

@fwkoch
Copy link
Contributor

fwkoch commented Mar 31, 2022

We still need info about kind, num, etc (i.e. the things you crossed out) on the cite group, right?

I had something like:

type CitationGroup = {
  type: 'citationGroup';
  kind: 'narrative' | 'parenthetical'; // 'citet' vs 'citep'
  parentheses: boolean; // if false, 'citealt' and 'citealp' instead
  mode: 'year' | 'numerical';
  children: Citation[];
};

And even a single citation is a child of of a citation group in the AST?

(Also, I like citation and citationGroup since these are "citations" not "cites" - but... that's more verbose and doesn't match natbib)

@chrisjsewell
Copy link
Contributor

chrisjsewell commented Mar 31, 2022

For sure, I think citations should be a "first-class citzen" of MyST 👍

One think that I do think its worth thinking about, is do you actually need to restrict "citations" to just the conventional bibligraphy type references?
Essentially, the abstraction is just a key(s) that references an external resource (bibtex, json, yaml, ...) which contains a dictionary of key -> fields , e.g.

key:
  field1: content
  field2: content

https://www.overleaf.com/learn/latex/Glossaries are also essentially the same abstraction as, to some extent, are https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#substitutions-with-jinja2 (see also something I was playing around with https://github.com/chrisjsewell/sphinx-glossary/blob/main/docs/index.md)

Do you need different node types for all of these, or can it be "generalised"? Or at least share a parent interface

@rowanc1
Copy link
Member Author

rowanc1 commented Mar 31, 2022

Nice, I like those additions to the group @fwkoch -- the reason I also had cite is that is an HTML element (see mdn), so seemed like sticking close to html/latex here would be good. (not sure about the group name though, in Curvenote we also use this group to wrap crossReferences, for example, which can collapse (Figure 1 & 2) while still having unique links to the content)

@chrisjsewell, I think that the citations are special/important enough to be their own mdast type, but maybe the syntax for creating them can be the same/extensible (which would be nice from a writing perspective). We are currently backing out our generalizations for citations in Curvenote at the moment after a few years: citations are special/weird enough to have their own dedicated type/apis/endpoints/etc. 🤷

Again, that is the mdast cite type only (e.g. locator isn't applicable to glossaries, or mode=year to abbreviations), I think the myst-syntax can be extensible though. 👍

@mmcky
Copy link

mmcky commented Mar 31, 2022

Thanks for starting this discussion. I agree with everyone here that citations are first class citizen of any scientific document.

I think a lot of users will come from LaTeX and bibtex background so some basic LaTeX similarities such as:

  1. a simple {cite} role (as we have)
  2. a way to change the style of references that are printed in the reference list to suit (i.e. Harvard etc.)
  3. a way to change the style of references in the text such as [1] and Jones (2009)

this combination covers pretty much most of my use of references from a LaTeX universe.

I also really like the flexibility being discussed here in adding more sophisticated references such as pages, see and chapter references. I agree that natbib is a good reference, and the concept of metadata for roles is an interesting idea. I wonder though if we aded an option extension syntax such as:

{cite}`jones1999 <<locator='chapter 2>>`

My other wish list item would be support for .bib (bibtex) files as a source of data for the citations, as I know a lot of authors that have invested in bib collections; in addition there are a lot of webpages that know provide copy and paste bibtex entries.


Also this must be a javascript thing but don't fully see why we need both an object name and a type defined

type CiteGroup = {
  type: 'citeGroup'

I guess you can't do the equivalent of isinstance() as done in python?

@chrisjsewell chrisjsewell added the enhancement New feature or request label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants