Static websites with pandoc

Pandoc describes itself as the swiss-army knife of file/markup conversion. It’ll happily convert your Markdown files to HTML, PDF, TeX or whatever else is supported. It’s list of supported file formats is truly impressive. It’s Haskell based and packaged for most popular Linux distributions. The Arch Linux installation with all it’s dependencies weighs in at over 400MB. Personally I think it’s worth it.

What put the tool on my this video showing how to turn Markdown into PDF slides by Luke Smith. Check it out to get a taste of how easy it is to use this tool and catch a glimpse of it’s potential.

Pandoc as a website generator

Being able to create HTML files from a number of other markup sources not much is needed to create a very simple website generator. You’re reading a Pandoc generated web page at this very moment.

The most basic conversion looks like this:

pandoc index.md -o index.html

The output isn’t just some basic HTML but also contains code highlighting tags. This isn’t always trivial to achieve.

A more advanced command would be:

pandoc index.md \
    -o index.html \
    --standalone \
    --template=$MY_PATH/template.html \
    --css "/css/milligram.min.css" \
    --css "/css/custom.css" \
    --variable=lastUpdated:$( stat -c %y index.md | cut -f 1 -d ' ' ) \
    --variable=creationDate:$( stat -c %w index.md | cut -f 1 -d ' ' )

As you can see Pandoc allows the use of Templates using the --template argument. The pandoc-templates on Github contains the default templates for reference. Changing a template is easy. Variables can be added or removed at will. Values for variables are then pulled either from the --variable argument(s) or the metadata stored in the files or a specific metadata file (see --metadata-file).

The two variables in this example just contains the markdown file’s dates of it’s last modification and it’s creation.

Automating the process

The above example works fine for a single file. Managing an entire website requires some automation however. For this task I created two scripts. build.sh finds all source markup files and process_md_file.sh to process each file.

build.sh:

#!/bin/bash

MY_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
SITE_PATH=$MY_PATH/site

find $SITE_PATH -name *.md -exec $MY_PATH/process_md_file.sh {} \;

exit 0

The find command just finds all Markdown files in my Site path and runs the next script for each result.

process_md_file.sh:

#!/bin/bash

MY_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
SITE_PATH=$MY_PATH/site

fullpath=$1
dirpath=$( dirname $1 )
sourcefile=$( basename $1 )
targetfile=$(echo "$sourcefile" | cut -f 1 -d '.')'.html'

creationDate=$( stat -c %w $fullpath | cut -f 1 -d ' ' )
lastUpdated=$( stat -c %y $fullpath | cut -f 1 -d ' ' )

echo Processing: $fullpath

rm $dirpath/$targetfile
pandoc $fullpath -o $dirpath/$targetfile --standalone --css "/css/milligram.min.css" --css "/css/custom.css" --template=$MY_PATH/template.html --variable=lastUpdated:$lastUpdated --variable=creationDate:$creationDate

exit 0

All the work is done by pandoc. Everything else is just to prepare the data for the arguments.

The site structure itself looks like this:

.
├── article1
│   ├── index.html
│   └── index.md
├── article2
│   ├── index.html
│   └── index.md
├── article3
│   ├── index.html
│   └── index.md
├── css
│   ├── custom.css
│   ├── milligram.min.css
│   └── milligram.min.css.map
├── index.html
└── index.md

This directory tree is ready for deployment on any old HTTP server without modifications.

Improvements

The examples above are in practical use to create this very website. They work in the real. But of course there’s plenty of room for improvement. One thing I don’t take into account at all is navigation. This is a very simple site. The Homepage is the central hub, end of navigation. For a more complex site this doesn’t work. Nor is there some sort of template selection for different needs. Not even a simple Blog listing.

Yet none of these issues are a problem to solve.

What a I like about Pandoc compared to full featured site generators is it’s bashiness. No gems, no Python packages, no npm orgies, etc. Install the package and you’re good to go. On top of that, the logic is up to you. I wanted something dead simple and Pandoc delivered in almost no time. And all lessons learned can easily be applied to more every day Pandoc scenarios.

Why make static websites?

Static websites are so 1999. $insertYourFavoriteCSSHere with all it’s dynamicness is sooo much better. Why would anyone bother with static HTML? You still use Dreamweaver? Golive?

Yes, CMS have their place and are useufl. Larger websites managed by multiple people clearly benefit from those solutions. I used to be a die hard Typo3 user using it for absolutly everything.

But let me give you a few reasons I prefer to avoid them:

  • They require a stack to run. No PHP, no database, no Wordpress. (just one example)
  • Without your stack you can’t see your website. This makes archiving it hard.
  • Code and content are separate. An advantage for large websites, small sites (like mine) would benefit from keeping the content directly in the git repository.
  • CMS and their stacks are subject to security issues and require constant patching. Usually when you’re on holiday or in the middle of some important project.
  • Performance may not be stellar.

For small sites the static approach makes sense. Interactive elements can be added through APIs and JavaScript. Think of comments or contact forms. There’s no reason to keep everything dynamic just for a few such features.

As always it’s about the right tool for the job. Yet instead of blindly dismissing the option of static HTML as 1999 I encourage you to take a look, Pandoc or otherwise.