The Blog of Charles Daniels

Building a Static Site Generator in 44 Lines of Make

Posted 2017-11-22

Building a Static Site Generator in 44 Lines of Make

Contents

Introduction

As regular readers will have noticed by the time this article is posted, cdaniels.net is no longer running Jekyll. Instead, it is generated by a new, fully custom static site generator which I have written as a day project during Thanksgiving Break.

This new static site generator offers a number of advantages:

Perhaps most entertainingly, for my use case, I have fully replace Jekyll with a 44 line Makefile, a couple of small shell scripts, and a fairly brief CSS file.

In this post, I will discuss the methodology used to construct these tools and switch cdaniels.net away from off-the-shelf static site generation. I decided to write this article in lieu of properly open-sourcing the new site generation tools for two reasons:

  1. Both the Makefile and the associated tooling are fairly specific to cdaniels.net, and generalizing would require considerable extra effort.
  2. Both the Makefile and the associated tooling are too trivial to even warrant their own repository or version number.

Overview

The new site generator operates by compiling a pile of Markdown (.md) files into HTML. The excellent Pandoc is used to convert the Markdown files to HTML. No special features of Pandoc are used in particular, and another Markdown-to-HTML transpiler could easily be used if desired. However, Pandoc is readily available, performant, and produces good quality output.

Post metadata is tracked separately from the actual Markdown file containing the post in the .meta "file format". This was a matter of convenience as it was easier to use .meta (which is really just a tab-separated-value, or TSV file). This makes it trivially easy to extract post metadata using shell scripts.

The .meta file format looks like this:

1
2
title this is the article title
date    1700-01-01

Both fields are arbitrary strings, although the index generator (more on this later) works best if the date field collates in the order you want it to show up in the index when sorted using the sort command.

Headers, footers, and other "glue" content which cannot be generated programatically, but that also is not part of any post body, are stored in .template files, which are really just HTML snippets with a different extension to prevent *.html from picking them up. These .template files are usually concatenated with other files to build the final HTML pages you see on cdaniels.net.

Some pages (namely About and Projects) do not have .meta files associated with them, which prevents them from being indexed as posts and appeaing in the "all posts" list. These are handled by special-case Makefile rules and are processed differently from all of the other pages.

The heading at the top of each page containing the website title ("The Blog of Charles Daniels"), as well as the navigation bar is contained in a template file which is concatenated with the post body after Pandoc has run. The post titles and post date are also injected after Pandoc has run, and are generated by a script. The footers (containing copyright information) are generated by a script and concatendated with the post body also.

The index page (i.e. the page with the list of all the posts on the site) is generated by parsing *.meta and generating some HTML <li> elements, which are concatenated with the global header template, as well as index specific header and footer template files.

Design Rationale

I'm not much of a designer, but I like my stuff to look nice. My main objectives when designing this site's stylesheet were:

I went with a serif font for the content because I find content presented in this way is easiest to absorb and consume. I tried to keep the contrast between the background and the text very high (I am not a fan of the modern trend of dark-grey text on a light-grey background and minimal contrast in between), however I also wanted to avoid the glaring black-on-white of unstyled HTML.

I threw in some sans-serif fonts here and there for content which I deliberately intend to de-emphasize, such as post dates, navigation elements, and so on. This text is also rendered in light grey wiht low contrast relative to the background to further draw focus away from it on an to the actual content.

In general, I try to avoid being "clever", or breaking the norm for HTML documents. As a result, my pages render quickly, print correctly out of the box, and the text reflows correctly when the user's browser window is re-sized. In other words, this site behaves the way a website is expected to behave (unlike many heavier sites that rely on complex toolkits that try to resize and reflow content at runtime).

Technical Approach

The Makefile

The Makefile used is carefully written to be idiomatic of Make. In particular, aside from the site rule, every rule depends on actual files on disk, and produces a file on disk. This means the site can be compiled safely in multi-threaded mode (care is taken to avoid race conditions while generating the index by generating each li element in a .idx file and concatenating them together later). This is a great boon while working on the site - using single-threaded make, it takes 2.1 second to compile on my 4C/8T i7 processor. Using make -j8 however, it takes 1.2 seconds. I imagine this will scale nicely as more posts are added, and as I come to own systems with more CPU cores, as the bottleneck appears to be the Pandoc calls to convert the Markdown files to HTML, and Pandoc appears to be compute, rather than I/O bound in this context.

The jist of the Makefile's tasks translated to English prose follows:

Makefile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
MDC=pandoc
MDFLAGS= -t html5 --mathml --no-highlight
HTMLFILES=$(shell ls *.md  | sort -r | sed 's/.md/.html/g')
IDXFILES=$(shell ls *.meta | sort -r | sed 's/.meta/.idx/g')

site: $(HTMLFILES) index.html

%.body.fragment : %.md
    $(MDC) $(MDFLAGS) -i $< -o $@

%.title.fragment : %.meta meta2title.sh
    ./meta2title.sh "$<" > "$@"

%.html : header.html.template %.title.fragment  %.body.fragment
    cat $^ > "$@"
    ./mkfooter.sh >> "$@"

projects.html: projects.md header.html.template
    $(MDC) $(MDFLAGS) -i $< -o $@.tmp
    cat header.html.template $@.tmp > $@
    ./mkfooter.sh >> "$@"

about.html: about.md header.html.template
    $(MDC) $(MDFLAGS) -i $< -o $@.tmp
    cat header.html.template $@.tmp > $@
    ./mkfooter.sh >> "$@"

%.idx : %.meta meta2idx.sh
    ./meta2idx.sh "$<" > "$@"

index_center.fragment: $(IDXFILES)
    echo "" > index_center.fragment
    for f in $$(ls *.idx | sort -r) ; do cat "$$f" >> "index_center.fragment" ; done

index.html: header.html.template index_header.template index_center.fragment index_footer.template
    printf '<title>The Blog of Charles Daniels</title>\n' > "$@"
    cat $^ >> "$@"
    ./mkfooter.sh >> "$@"

clean:
    -rm *.html
    -rm *.idx
    -rm *.fragment
    -rm *.tmp

meta2idx.sh

This script (included below) converts .meta TSV metadata files into <li> elements suitable for inclusion in the site index.

The only part of this script which is of particular interest is the bits to extract the post title and date, which is accomplished by the command grep '^title.*' < "$META_FILE" | cut -f 2 (in this case, we are extracting the title field). This searches for a line that begins with the string literal title, then extracts the second field therefrom (keep in mind that cut uses \t as a delimiter by default).

For illustrative purposes, here is the output of meta2idx.sh for this post's metadata file:

1
2
3
4
5
        <li class="postlisting">
                <!--POST_TITLE='Building a Static Site in 44 Lines of Make', POST_DATE='2017-11-22', POST_FILE='.//2017-11-22_make-static-site.html'-->
                <span class="postdate">2017-11-22</span>
                <h2><div class="postlink"><a href=".//2017-11-22_make-static-site.html">Building a Static Site in 44 Lines of Make</a></div></h2>
        </li>

meta2idx.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/sh

META_FILE="$1"

POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"

printf '\t<li class="postlisting">\n'
printf "\t\t<!--POST_TITLE='$POST_TITLE', POST_DATE='$POST_DATE', POST_FILE='$POST_FILE'-->\n"
printf '\t\t<span class="postdate">%s</span>\n' "$POST_DATE"
printf '\t\t<h2><div class="postlink"><a href="%s">%s</a></div></h2>\n' "$POST_FILE" "$POST_TITLE"
printf "\t</li>\n\n"

meta2title.sh

This script produces the title and post date, which are injected into each post's HTML file. This script also produces the HTML <title> element for each post as well. It is essentially the same thing as meta2idx.sh, but with a different HTML template.

For illustrative purposes, the output of meta2title.sh for this post follows:

1
2
3
4
5
<div class="posttitle">
        <h1 class=posttitletext>Building a Static Site in 44 Lines of Make</h1>
        <p class=postdatetext><small>Posted 2017-11-22</small></p>
</div>
<title>Building a Static Site in 44 Lines of Make</title><!--Generated on Wed Nov 22 22:31:21 EST 2017 --!>

meta2title.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/bin/sh

META_FILE="$1"

POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"

printf '<div class="posttitle">\n'
printf '\t<h1 class=posttitletext>%s</h1>\n' "$POST_TITLE"
printf '\t<p class=postdatetext><small>Posted %s</small></p>\n' "$POST_DATE"
printf '</div>\n'
printf '<title>%s</title>' "$POST_TITLE"
printf '<!--Generated on %s --!>' "$(date)"

mkfooter.sh

This is the simplest of the three scripts used to generate this site - it generates a copyright message with the current year, and includes the Creative Commons licenses I use for all of my posts. This script generates the same output every time it is run, which is included below for illustrative purposes.

1
2
3
4
5
6
7
<footer>
        <div class="footertext">
                <hr>
                <small>Copyright 2017 Charles Daniels</small>
                <p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>
        </div>
</footer>

mkfooter.sh

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/bin/sh

CURRENT_YEAR=$(date +%Y)

printf "<footer>\n"
printf '\t<div class="footertext">\n'
printf '\t\t<hr>\n'
printf '\t\t<small>Copyright %s Charles Daniels</small>\n' "$CURRENT_YEAR"
printf '\t\t<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>\n'
printf '\t</div>\n'
printf '</footer>'

Conclusion

This methodology demonstrates that it is entirely tenable, and in fact relatively straightforward to build a simple static site without complex dependencies.

I would thoroughly recommend this approach to anyone looking to run a small blog or website where complex CMS tools, themes, or dynamic content are not required.

There are however, still a few sticking points that I will be working to clear up in the future, namely