Building a Static Site Generator in 44 Lines of Make

2017-11-22

/posts/2017-11-22_make-static-site/ map[name:Charles Daniels]

Table of Contents

As regular readers will have noticed by the time this article is posted, cdaniels.net is no longer running Jekyll. Instead, it is generated by a new, fully custom static site generator which I have written as a day project during Thanksgiving Break.

This new static site generator offers a number of advantages:

It produces more visually appealing output (in my opinion)
It is easier for me to maintain
It requires less dependencies to run (no more Ruby, just make and sh)
cdaniels.net no longer uses any JavaScript, at all
cdaniels.net now renders correctly in non-mainstream browsers (in particular, it’s now entirely usable in Links)
It has reduced page weight by about half

Perhaps most entertainingly, for my use case, I have fully replace Jekyll with a 44 line Makefile, a couple of small shell scripts, and a fairly brief CSS file.

In this post, I will discuss the methodology used to construct these tools and switch cdaniels.net away from off-the-shelf static site generation. I decided to write this article in lieu of properly open-sourcing the new site generation tools for two reasons:

Both the Makefile and the associated tooling are fairly specific to cdaniels.net, and generalizing would require considerable extra effort.
Both the Makefile and the associated tooling are too trivial to even warrant their own repository or version number.

# Overview

The new site generator operates by compiling a pile of Markdown (.md) files into HTML. The excellent Pandoc is used to convert the Markdown files to HTML. No special features of Pandoc are used in particular, and another Markdown-to-HTML transpiler could easily be used if desired. However, Pandoc is readily available, performant, and produces good quality output.

Post metadata is tracked separately from the actual Markdown file containing the post in the .meta “file format”. This was a matter of convenience as it was easier to use .meta (which is really just a tab-separated-value, or TSV file). This makes it trivially easy to extract post metadata using shell scripts.

The .meta file format looks like this:

title	this is the article title
date	1700-01-01

Both fields are arbitrary strings, although the index generator (more on this later) works best if the date field collates in the order you want it to show up in the index when sorted using the sort command.

Headers, footers, and other “glue” content which cannot be generated programatically, but that also is not part of any post body, are stored in .template files, which are really just HTML snippets with a different extension to prevent *.html from picking them up. These .template files are usually concatenated with other files to build the final HTML pages you see on cdaniels.net.

Some pages (namely About and Projects) do not have .meta files associated with them, which prevents them from being indexed as posts and appeaing in the “all posts” list. These are handled by special-case Makefile rules and are processed differently from all of the other pages.

The heading at the top of each page containing the website title (“The Blog of Charles Daniels”), as well as the navigation bar is contained in a template file which is concatenated with the post body after Pandoc has run. The post titles and post date are also injected after Pandoc has run, and are generated by a script. The footers (containing copyright information) are generated by a script and concatendated with the post body also.

The index page (i.e. the page with the list of all the posts on the site) is generated by parsing *.meta and generating some HTML <li> elements, which are concatenated with the global header template, as well as index specific header and footer template files.

# Design Rationale

I’m not much of a designer, but I like my stuff to look nice. My main objectives when designing this site’s stylesheet were:

Minimizing page weight
Maximizing readability

I went with a serif font for the content because I find content presented in this way is easiest to absorb and consume. I tried to keep the contrast between the background and the text very high (I am not a fan of the modern trend of dark-grey text on a light-grey background and minimal contrast in between), however I also wanted to avoid the glaring black-on-white of unstyled HTML.

I threw in some sans-serif fonts here and there for content which I deliberately intend to de-emphasize, such as post dates, navigation elements, and so on. This text is also rendered in light grey wiht low contrast relative to the background to further draw focus away from it on an to the actual content.

In general, I try to avoid being “clever”, or breaking the norm for HTML documents. As a result, my pages render quickly, print correctly out of the box, and the text reflows correctly when the user’s browser window is re-sized. In other words, this site behaves the way a website is expected to behave (unlike many heavier sites that rely on complex toolkits that try to resize and reflow content at runtime).

# Technical Approach

## The Makefile

The Makefile used is carefully written to be idiomatic of Make. In particular, aside from the site rule, every rule depends on actual files on disk, and produces a file on disk. This means the site can be compiled safely in multi-threaded mode (care is taken to avoid race conditions while generating the index by generating each li element in a .idx file and concatenating them together later). This is a great boon while working on the site - using single-threaded make, it takes 2.1 second to compile on my 4C/8T i7 processor. Using make -j8 however, it takes 1.2 seconds. I imagine this will scale nicely as more posts are added, and as I come to own systems with more CPU cores, as the bottleneck appears to be the Pandoc calls to convert the Markdown files to HTML, and Pandoc appears to be compute, rather than I/O bound in this context.

The jist of the Makefile’s tasks translated to English prose follows:

A list of every HTML file which could be generated from every markdown file (*.md) is compiled into the HTMLFILES variable
Similarly, a list of every .idx that could be generated from the set of every metadata file (*.meta) is generated and stored in IDXFILES; each such .idx file contains one <li> element to display one post on the front page of the site.
The site requires that all HTML files be generated, plus the index
A .body.fragment file is generated for each post containing it’s body
A .title.fragment file is generated for every post from it’s title and post date
An HTML file for a single post is generated by concatenating the globally shared header, the title fragment, the body fragment, and the output of mkfooter.sh
The Projects and About pages are generated by simply compiling their Markdown file to HTML and concatenating the result with the shared header and the output of mkfooter.sh. These are not posts, so no title is generated.
For every metadata file, an index entry is generated in a .idx file for later inclusion in the main index.html page.
Every index file is appended to the index_center fragment file, which contains the body of the <ul> element which is the posts list.
The index itself (index.html) is generated by concating the globally shared header, index header template, the list of posts, and the index footer template, as well as the share footer.

Makefile

MDC=pandoc
MDFLAGS= -t html5 --mathml --no-highlight
HTMLFILES=$(shell ls *.md  | sort -r | sed 's/.md/.html/g')
IDXFILES=$(shell ls *.meta | sort -r | sed 's/.meta/.idx/g')

site: $(HTMLFILES) index.html

%.body.fragment : %.md
	$(MDC) $(MDFLAGS) -i $< -o $@

%.title.fragment : %.meta meta2title.sh
	./meta2title.sh "$<" > "$@"

%.html : header.html.template %.title.fragment  %.body.fragment
	cat $^ > "$@"
	./mkfooter.sh >> "$@"

projects.html: projects.md header.html.template
	$(MDC) $(MDFLAGS) -i $< -o $@.tmp
	cat header.html.template $@.tmp > $@
	./mkfooter.sh >> "$@"

about.html: about.md header.html.template
	$(MDC) $(MDFLAGS) -i $< -o $@.tmp
	cat header.html.template $@.tmp > $@
	./mkfooter.sh >> "$@"

%.idx : %.meta meta2idx.sh
	./meta2idx.sh "$<" > "$@"

index_center.fragment: $(IDXFILES)
	echo "" > index_center.fragment
	for f in $$(ls *.idx | sort -r) ; do cat "$$f" >> "index_center.fragment" ; done

index.html: header.html.template index_header.template index_center.fragment index_footer.template
	printf '<title>The Blog of Charles Daniels</title>\\n' > "$@"
	cat $^ >> "$@"
	./mkfooter.sh >> "$@"

clean:
	-rm *.html
	-rm *.idx
	-rm *.fragment
	-rm *.tmp

## `meta2idx.sh`

This script (included below) converts .meta TSV metadata files into <li> elements suitable for inclusion in the site index.

The only part of this script which is of particular interest is the bits to extract the post title and date, which is accomplished by the command grep '^title.*' < "$META_FILE" | cut -f 2 (in this case, we are extracting the title field). This searches for a line that begins with the string literal title, then extracts the second field therefrom (keep in mind that cut uses \\t as a delimiter by default).

For illustrative purposes, here is the output of meta2idx.sh for this post’s metadata file:

        <li class="postlisting">
                <!--POST_TITLE='Building a Static Site in 44 Lines of Make', POST_DATE='2017-11-22', POST_FILE='.//2017-11-22_make-static-site.html'-->
                <span class="postdate">2017-11-22</span>
                <h2><div class="postlink"><a href=".//2017-11-22_make-static-site.html">Building a Static Site in 44 Lines of Make</a></div></h2>
        </li>

meta2idx.sh

#!/bin/sh

META_FILE="$1"

POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"

printf '\\t<li class="postlisting">\\n'
printf "\\t\\t<!--POST_TITLE='$POST_TITLE', POST_DATE='$POST_DATE', POST_FILE='$POST_FILE'-->\\n"
printf '\\t\\t<span class="postdate">%s</span>\\n' "$POST_DATE"
printf '\\t\\t<h2><div class="postlink"><a href="%s">%s</a></div></h2>\\n' "$POST_FILE" "$POST_TITLE"
printf "\\t</li>\\n\\n"

## `meta2title.sh`

This script produces the title and post date, which are injected into each post’s HTML file. This script also produces the HTML <title> element for each post as well. It is essentially the same thing as meta2idx.sh, but with a different HTML template.

For illustrative purposes, the output of meta2title.sh for this post follows:

<div class="posttitle">
        <h1 class=posttitletext>Building a Static Site in 44 Lines of Make</h1>
        <p class=postdatetext><small>Posted 2017-11-22</small></p>
</div>
<title>Building a Static Site in 44 Lines of Make</title><!--Generated on Wed Nov 22 22:31:21 EST 2017 --!>

meta2title.sh

#!/bin/sh

META_FILE="$1"

POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"

printf '<div class="posttitle">\\n'
printf '\\t<h1 class=posttitletext>%s</h1>\\n' "$POST_TITLE"
printf '\\t<p class=postdatetext><small>Posted %s</small></p>\\n' "$POST_DATE"
printf '</div>\\n'
printf '<title>%s</title>' "$POST_TITLE"
printf '<!--Generated on %s --!>' "$(date)"

## `mkfooter.sh`

This is the simplest of the three scripts used to generate this site - it generates a copyright message with the current year, and includes the Creative Commons licenses I use for all of my posts. This script generates the same output every time it is run, which is included below for illustrative purposes.

<footer>
        <div class="footertext">
                <hr>
                <small>Copyright 2017 Charles Daniels</small>
                <p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>
        </div>
</footer>

mkfooter.sh

#!/bin/sh

CURRENT_YEAR=$(date +%Y)

printf "<footer>\\n"
printf '\\t<div class="footertext">\\n'
printf '\\t\\t<hr>\\n'
printf '\\t\\t<small>Copyright %s Charles Daniels</small>\\n' "$CURRENT_YEAR"
printf '\\t\\t<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>\\n'
printf '\\t</div>\\n'
printf '</footer>'

# Conclusion

This methodology demonstrates that it is entirely tenable, and in fact relatively straightforward to build a simple static site without complex dependencies.

I would thoroughly recommend this approach to anyone looking to run a small blog or website where complex CMS tools, themes, or dynamic content are not required.

There are however, still a few sticking points that I will be working to clear up in the future, namely

Monospaced “literal” code blocks (<code> and <pre> for the HTML folks) do not wrap. Setting them to wrap could produce a result that cannot be safely copy-pasted without manually un-wrapping the lines, and looks weird besides. In the future, I would like to find a way to enable line wrapping and generate a visual indicator showing that the line has been wrapped, perhaps by inserting an 0x21AA (“RIGHTWARDS ARROW WITH HOOK”) symbol (↪) at the beginning of any wrapped lines. This may involve pre-processing the Markdown files before feeding them to Pandoc.
An RSS feed is not generated yet, which I believe Jekyll was handling for me previously. This can probably be fixed with an extra script to generate RSS entries from metadata files… a project for some future weekend.
The index on the front page does not paginate. This is not an issue now, but as the quantity of posts on this site increases, this may become a problem. When this becomes a problem, I will fix it.