github twitter linkedin hackernews goodreads angellist keybase email pgp rss

Render Org-Mode Files with Real Pandoc in Hugo

Note: See my update below for a better solution, if you're okay with recompiling Hugo.


I use Hugo to publish this website and I like to write in org-mode. Hugo's org-mode support is incomplete at best.

I was fighting with Hugo's org renderer yet again this morning. It couldn't even handle nested lists or paragraphs inside a third level heading! After getting frustrated trying to find a fix, I decided to fix the issue once and for all for myself.

I like the Flask philosophy of using the best tool available for each task, instead of having to use the bundled plugins. In contrast, Django used to force you to use their own neutered templating system. Django doesn't have this restriction any more, but the philosophical difference stands. Hugo's org renderer will never be as good as a real org renderer like Pandoc.

Though Hugo has external helpers to call out to Pandoc, I don't see a way to pass arguments to Pandoc, especially to ask it to treat the input as an org file, not markdown.

So I wrote this simple script that watches for file changes and renders org files as html. pandoc-content directory sits at the same level as content directory. If I put something in pandoc-content directory, the script will render it as html, while preserving front-matter, and output that to the corresponding location in the content directory.

For example, pandoc-content/blog/next.org is automatically rendered as content/blog/next.output.html I gitignored *.output.html so that I track it just in one place and ignore the machine generated file. I will still use content to author other types of files, especially markdown, as this would be faster than running everything through Pandoc.

I didn't have to copy-paste the following code from the actual script that I run into this blog post. I was able to just use the org-mode include directive. How cool is that!

import argparse
from datetime import datetime
import pyinotify
import pypandoc
import textwrap

class EventHandler(pyinotify.ProcessEvent):
    def __init__(self, inpdir, outdir):
        self.inpdir = inpdir
        self.outdir = outdir

    def process_IN_CREATE(self, event):
        process_file(event.pathname, self.inpdir, self.outdir)

    def process_IN_MODIFY(self, event):
        process_file(event.pathname, self.inpdir, self.outdir)

def process_file(inpfile, inpdir, outdir):
    if '/.#' in inpfile: return # ignore emacs temp files
    outfile = inpfile.replace(inpdir, outdir).replace('.org', '.output.html')
    print(textwrap.dedent(f'''\
    {datetime.now().strftime('%I:%M%p')}
    Input  : {inpfile}
    Output : {outfile}
    '''))
    with open(inpfile) as f:
        r = f.read()

    # split front-matter and content
    lines = r.split('\n')
    for i, line in enumerate(lines):
        if (line.strip() == ''): break
    frontmatter = '\n'.join(lines[:i])
    content = "\n".join(lines[i+1:])
    html = pypandoc.convert_text(content, format="org", to="html")
    output = f'{frontmatter}\n{html}'

    with open(outfile, 'w') as f:
        f.write(output)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--input-dir", type=str, required=True,
                        help="Input Directory")
    parser.add_argument("-o", "--output-dir", type=str, required=True,
                        help="Output Directory")
    args = parser.parse_args()

    wm = pyinotify.WatchManager()
    mask = pyinotify.IN_CREATE | pyinotify.IN_MODIFY
    handler = EventHandler(args.input_dir, args.output_dir)
    notifier = pyinotify.Notifier(wm, handler)
    wdd = wm.add_watch(args.input_dir, mask, rec=True)
    print(f'Watching for file changes in {args.input_dir}')
    notifier.loop()

Update

[Nov 11] I thought of a way better solution this morning. It doesn't require you to keep content in two different places depending on file type, maintain a parallel directory structure between them, run a Python script that puts rendered files in content/ directory because of which you have to gitignore those files etc. It's a one-liner too! It requires you to recompile Hugo though.

Patch

GitHub

Alternatively, use my branch: (relevant commit)

Installation

Historical Context

Here are some notes from what I gathered while researching this issue:

  1. When betaveros originally coded pandoc support, he put in the ability to pass arbitrary flags: commit
  2. bep wondered if this introduces a shell injection vuln: conversation / direct link to the comment
  3. betaveros removed the arbitrary flag support from the pull request that was ultimately merged into master

Next Steps

I'd like to infer the format from the filename of the content/ file. That is, if the filename is "abc.org.pandoc", I'd like for Hugo to pass --from=org to Pandoc automatically. This would allow for maximum flexibility with minimal change.

I plan to create and submit this patch, but that's for another day.