As regular readers will have noticed by the time this article is posted, cdaniels.net is no longer running Jekyll. Instead, it is generated by a new, fully custom static site generator which I have written as a day project during Thanksgiving Break.
This new static site generator offers a number of advantages:
Perhaps most entertainingly, for my use case, I have fully replace Jekyll with a 44 line Makefile, a couple of small shell scripts, and a fairly brief CSS file.
In this post, I will discuss the methodology used to construct these tools and switch cdaniels.net away from off-the-shelf static site generation. I decided to write this article in lieu of properly open-sourcing the new site generation tools for two reasons:
The new site generator operates by compiling a pile of Markdown (
.md) files into HTML. The excellent Pandoc is used to convert the Markdown files to HTML. No special features of Pandoc are used in particular, and another Markdown-to-HTML transpiler could easily be used if desired. However, Pandoc is readily available, performant, and produces good quality output.
Post metadata is tracked separately from the actual Markdown file containing the post in the
.meta “file format”. This was a matter of convenience as it was easier to use
.meta (which is really just a tab-separated-value, or TSV file). This makes it trivially easy to extract post metadata using shell scripts.
.meta file format looks like this:
title this is the article title date 1700-01-01
Both fields are arbitrary strings, although the index generator (more on this later) works best if the
date field collates in the order you want it to show up in the index when sorted using the
Headers, footers, and other “glue” content which cannot be generated programatically, but that also is not part of any post body, are stored in
.template files, which are really just HTML snippets with a different extension to prevent
*.html from picking them up. These
.template files are usually concatenated with other files to build the final HTML pages you see on cdaniels.net.
Some pages (namely About and Projects) do not have
.meta files associated with them, which prevents them from being indexed as posts and appeaing in the “all posts” list. These are handled by special-case Makefile rules and are processed differently from all of the other pages.
The heading at the top of each page containing the website title (“The Blog of Charles Daniels”), as well as the navigation bar is contained in a template file which is concatenated with the post body after Pandoc has run. The post titles and post date are also injected after Pandoc has run, and are generated by a script. The footers (containing copyright information) are generated by a script and concatendated with the post body also.
The index page (i.e. the page with the list of all the posts on the site) is generated by parsing
*.meta and generating some HTML
<li> elements, which are concatenated with the global header template, as well as index specific header and footer template files.
I’m not much of a designer, but I like my stuff to look nice. My main objectives when designing this site’s stylesheet were:
I went with a serif font for the content because I find content presented in this way is easiest to absorb and consume. I tried to keep the contrast between the background and the text very high (I am not a fan of the modern trend of dark-grey text on a light-grey background and minimal contrast in between), however I also wanted to avoid the glaring black-on-white of unstyled HTML.
I threw in some sans-serif fonts here and there for content which I deliberately intend to de-emphasize, such as post dates, navigation elements, and so on. This text is also rendered in light grey wiht low contrast relative to the background to further draw focus away from it on an to the actual content.
In general, I try to avoid being “clever”, or breaking the norm for HTML documents. As a result, my pages render quickly, print correctly out of the box, and the text reflows correctly when the user’s browser window is re-sized. In other words, this site behaves the way a website is expected to behave (unlike many heavier sites that rely on complex toolkits that try to resize and reflow content at runtime).
The Makefile used is carefully written to be idiomatic of Make. In particular, aside from the
site rule, every rule depends on actual files on disk, and produces a file on disk. This means the site can be compiled safely in multi-threaded mode (care is taken to avoid race conditions while generating the index by generating each
li element in a
.idx file and concatenating them together later). This is a great boon while working on the site - using single-threaded
make, it takes 2.1 second to compile on my 4C/8T i7 processor. Using
make -j8 however, it takes 1.2 seconds. I imagine this will scale nicely as more posts are added, and as I come to own systems with more CPU cores, as the bottleneck appears to be the Pandoc calls to convert the Markdown files to HTML, and Pandoc appears to be compute, rather than I/O bound in this context.
The jist of the Makefile’s tasks translated to English prose follows:
*.md) is compiled into the
.idxthat could be generated from the set of every metadata file (
*.meta) is generated and stored in
IDXFILES; each such
.idxfile contains one
<li>element to display one post on the front page of the site.
.body.fragmentfile is generated for each post containing it’s body
.title.fragmentfile is generated for every post from it’s title and post date
mkfooter.sh. These are not posts, so no title is generated.
.idxfile for later inclusion in the main
index_centerfragment file, which contains the body of the
<ul>element which is the posts list.
index.html) is generated by concating the globally shared header, index header template, the list of posts, and the index footer template, as well as the share footer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
MDC=pandoc MDFLAGS= -t html5 --mathml --no-highlight HTMLFILES=(shell ls *.md | sort -r | sed s/.md/.html/g ) IDXFILES=(shell ls *.meta | sort -r | sed s/.meta/.idx/g ) site: (HTMLFILES) index.html %.body.fragment : %.md (MDC) (MDFLAGS) -i < -o @ %.title.fragment : %.meta meta2title.sh ./meta2title.sh "$<" > "$@" %.html : header.html.template %.title.fragment %.body.fragment cat ^ > "$@" ./mkfooter.sh >> "$@" projects.html: projects.md header.html.template (MDC) (MDFLAGS) -i < -o @.tmp cat header.html.template @.tmp > @ ./mkfooter.sh >> "$@" about.html: about.md header.html.template (MDC) (MDFLAGS) -i < -o @.tmp cat header.html.template @.tmp > @ ./mkfooter.sh >> "$@" %.idx : %.meta meta2idx.sh ./meta2idx.sh "$<" > "$@" index_center.fragment: (IDXFILES) echo "" > index_center.fragment for f in (ls *.idx | sort -r) ; do cat "$$f" >> "index_center.fragment" ; done index.html: header.html.template index_header.template index_center.fragment index_footer.template printf <title>The Blog of Charles Daniels</title>n > "$@" cat ^ >> "$@" ./mkfooter.sh >> "$@" clean: -rm *.html -rm *.idx -rm *.fragment -rm *.tmp
This script (included below) converts
.meta TSV metadata files into
<li> elements suitable for inclusion in the site index.
The only part of this script which is of particular interest is the bits to extract the post title and date, which is accomplished by the command
grep '^title.*' < "$META_FILE" | cut -f 2 (in this case, we are extracting the
title field). This searches for a line that begins with the string literal
title, then extracts the second field therefrom (keep in mind that
\t as a delimiter by default).
For illustrative purposes, here is the output of
meta2idx.sh for this post’s metadata file:
1 2 3 4 5
<li class="postlisting"> <!--POST_TITLE='Building a Static Site in 44 Lines of Make', POST_DATE='2017-11-22', POST_FILE='.//2017-11-22_make-static-site.html'--> <span class="postdate">2017-11-22</span> <h2><div class="postlink"><a href=".//2017-11-22_make-static-site.html">Building a Static Site in 44 Lines of Make</a></div></h2> </li>
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#!/bin/sh META_FILE="$1" POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)" POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)" POSTS_ROOT="./" POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html" printf '\t<li class="postlisting">\n' printf "\t\t<!--POST_TITLE='$POST_TITLE', POST_DATE='$POST_DATE', POST_FILE='$POST_FILE'-->\n" printf '\t\t<span class="postdate">%s</span>\n' "$POST_DATE" printf '\t\t<h2><div class="postlink"><a href="%s">%s</a></div></h2>\n' "$POST_FILE" "$POST_TITLE" printf "\t</li>\n\n"
This script produces the title and post date, which are injected into each post’s HTML file. This script also produces the HTML
<title> element for each post as well. It is essentially the same thing as
meta2idx.sh, but with a different HTML template.
For illustrative purposes, the output of
meta2title.sh for this post follows:
1 2 3 4 5
<div class="posttitle"> <h1 class=posttitletext>Building a Static Site in 44 Lines of Make</h1> <p class=postdatetext><small>Posted 2017-11-22</small></p> </div> <title>Building a Static Site in 44 Lines of Make</title><!--Generated on Wed Nov 22 22:31:21 EST 2017 --!>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#!/bin/sh META_FILE="$1" POST_TITLE="$(grep '^title.*' <div class="posttitle">\n' printf '\t<h1 class=posttitletext>%s</h1>\n' "$POST_TITLE" printf '\t<p class=postdatetext><small>Posted %s</small></p>\n' "$POST_DATE" printf '</div>\n' printf '<title>%s</title>' "$POST_TITLE" printf '<!--Generated on %s --!>' "$(date)""$META_FILE" | cut -f 2)" POST_DATE="$(grep '^date.*' "$META_FILE" | cut -f 2)" POSTS_ROOT="./" POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html" printf '
This is the simplest of the three scripts used to generate this site - it generates a copyright message with the current year, and includes the Creative Commons licenses I use for all of my posts. This script generates the same output every time it is run, which is included below for illustrative purposes.
1 2 3 4 5 6 7
<footer> <div class="footertext"> <hr> <small>Copyright 2017 Charles Daniels</small> <p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p> </div> </footer>
1 2 3 4 5 6 7 8 9 10 11
#!/bin/sh CURRENT_YEAR=$(date +%Y) printf "<footer>\n" printf '\t<div class="footertext">\n' printf '\t\t<hr>\n' printf '\t\t<small>Copyright %s Charles Daniels</small>\n' "$CURRENT_YEAR" printf '\t\t<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>\n' printf '\t</div>\n' printf '</footer>'
This methodology demonstrates that it is entirely tenable, and in fact relatively straightforward to build a simple static site without complex dependencies.
I would thoroughly recommend this approach to anyone looking to run a small blog or website where complex CMS tools, themes, or dynamic content are not required.
There are however, still a few sticking points that I will be working to clear up in the future, namely
<pre>for the HTML folks) do not wrap. Setting them to wrap could produce a result that cannot be safely copy-pasted without manually un-wrapping the lines, and looks weird besides. In the future, I would like to find a way to enable line wrapping and generate a visual indicator showing that the line has been wrapped, perhaps by inserting an 0x21AA (“RIGHTWARDS ARROW WITH HOOK”) symbol (↪) at the beginning of any wrapped lines. This may involve pre-processing the Markdown files before feeding them to Pandoc.