Building a Static Site Generator in 44 Lines of Make
As regular readers will have noticed by the time this article is posted, cdaniels.net is no longer running Jekyll. Instead, it is generated by a new, fully custom static site generator which I have written as a day project during Thanksgiving Break.
This new static site generator offers a number of advantages:
- It produces more visually appealing output (in my opinion)
- It is easier for me to maintain
- It requires less dependencies to run (no more Ruby, just
make
andsh
) - cdaniels.net no longer uses any JavaScript, at all
- cdaniels.net now renders correctly in non-mainstream browsers (in particular, it’s now entirely usable in Links)
- It has reduced page weight by about half
Perhaps most entertainingly, for my use case, I have fully replace Jekyll with a 44 line Makefile, a couple of small shell scripts, and a fairly brief CSS file.
In this post, I will discuss the methodology used to construct these tools and switch cdaniels.net away from off-the-shelf static site generation. I decided to write this article in lieu of properly open-sourcing the new site generation tools for two reasons:
- Both the Makefile and the associated tooling are fairly specific to cdaniels.net, and generalizing would require considerable extra effort.
- Both the Makefile and the associated tooling are too trivial to even warrant their own repository or version number.
Overview
The new site generator operates by compiling a pile of Markdown (.md
) files
into HTML. The excellent Pandoc is used to
convert the Markdown files to HTML. No special features of Pandoc are used in
particular, and another Markdown-to-HTML transpiler could easily be used if
desired. However, Pandoc is readily available, performant, and produces good
quality output.
Post metadata is tracked separately from the actual Markdown file containing
the post in the .meta
“file format”. This was a matter of convenience as it
was easier to use .meta
(which is really just a tab-separated-value, or
TSV file). This makes it
trivially easy to extract post metadata using shell scripts.
The .meta
file format looks like this:
title this is the article title
date 1700-01-01
Both fields are arbitrary strings, although the index generator (more on this
later) works best if the date
field collates in the order you want it to show
up in the index when sorted using the sort
command.
Headers, footers, and other “glue” content which cannot be generated
programatically, but that also is not part of any post body, are stored in
.template
files, which are really just HTML snippets with a different
extension to prevent *.html
from picking them up. These .template
files are
usually concatenated with other files to build the final HTML pages you see on
cdaniels.net.
Some pages (namely About and Projects) do not have .meta
files associated
with them, which prevents them from being indexed as posts and appeaing in the
“all posts” list. These are handled by special-case Makefile rules and are
processed differently from all of the other pages.
The heading at the top of each page containing the website title (“The Blog of Charles Daniels”), as well as the navigation bar is contained in a template file which is concatenated with the post body after Pandoc has run. The post titles and post date are also injected after Pandoc has run, and are generated by a script. The footers (containing copyright information) are generated by a script and concatendated with the post body also.
The index page (i.e. the page with the list of all the posts on the site) is
generated by parsing *.meta
and generating some HTML <li>
elements, which
are concatenated with the global header template, as well as index specific
header and footer template files.
Design Rationale
I’m not much of a designer, but I like my stuff to look nice. My main objectives when designing this site’s stylesheet were:
- Minimizing page weight
- Maximizing readability
I went with a serif font for the content because I find content presented in this way is easiest to absorb and consume. I tried to keep the contrast between the background and the text very high (I am not a fan of the modern trend of dark-grey text on a light-grey background and minimal contrast in between), however I also wanted to avoid the glaring black-on-white of unstyled HTML.
I threw in some sans-serif fonts here and there for content which I deliberately intend to de-emphasize, such as post dates, navigation elements, and so on. This text is also rendered in light grey wiht low contrast relative to the background to further draw focus away from it on an to the actual content.
In general, I try to avoid being “clever”, or breaking the norm for HTML documents. As a result, my pages render quickly, print correctly out of the box, and the text reflows correctly when the user’s browser window is re-sized. In other words, this site behaves the way a website is expected to behave (unlike many heavier sites that rely on complex toolkits that try to resize and reflow content at runtime).
Technical Approach
The Makefile
The Makefile used is carefully written to be idiomatic of Make. In particular,
aside from the site
rule, every rule depends on actual files on disk, and
produces a file on disk. This means the site can be compiled safely in
multi-threaded mode (care is taken to avoid race conditions while generating
the index by generating each li
element in a .idx
file and
concatenating them together later). This is a great boon while working on the
site - using single-threaded make
, it takes 2.1 second to compile on my 4C/8T
i7 processor. Using make -j8
however, it takes 1.2 seconds. I imagine this
will scale nicely as more posts are added, and as I come to own systems with
more CPU cores, as the bottleneck appears to be the Pandoc calls to convert the
Markdown files to HTML, and Pandoc appears to be compute, rather than I/O bound
in this context.
The jist of the Makefile’s tasks translated to English prose follows:
- A list of every HTML file which could be generated from every
markdown file (
*.md
) is compiled into theHTMLFILES
variable - Similarly, a list of every
.idx
that could be generated from the set of every metadata file (*.meta
) is generated and stored inIDXFILES
; each such.idx
file contains one<li>
element to display one post on the front page of the site. - The site requires that all HTML files be generated, plus the index
- A
.body.fragment
file is generated for each post containing it’s body - A
.title.fragment
file is generated for every post from it’s title and post date - An HTML file for a single post is generated by concatenating the
globally shared header, the title fragment, the body fragment, and
the output of
mkfooter.sh
- The Projects and About pages are generated by simply compiling their
Markdown file to HTML and concatenating the result with the shared
header and the output of
mkfooter.sh
. These are not posts, so no title is generated. - For every metadata file, an index entry is generated in a
.idx
file for later inclusion in the mainindex.html
page. - Every index file is appended to the
index_center
fragment file, which contains the body of the<ul>
element which is the posts list. - The index itself (
index.html
) is generated by concating the globally shared header, index header template, the list of posts, and the index footer template, as well as the share footer.
Makefile
MDC=pandoc
MDFLAGS= -t html5 --mathml --no-highlight
HTMLFILES=$(shell ls *.md | sort -r | sed 's/.md/.html/g')
IDXFILES=$(shell ls *.meta | sort -r | sed 's/.meta/.idx/g')
site: $(HTMLFILES) index.html
%.body.fragment : %.md
$(MDC) $(MDFLAGS) -i $< -o $@
%.title.fragment : %.meta meta2title.sh
./meta2title.sh "$<" > "$@"
%.html : header.html.template %.title.fragment %.body.fragment
cat $^ > "$@"
./mkfooter.sh >> "$@"
projects.html: projects.md header.html.template
$(MDC) $(MDFLAGS) -i $< -o $@.tmp
cat header.html.template $@.tmp > $@
./mkfooter.sh >> "$@"
about.html: about.md header.html.template
$(MDC) $(MDFLAGS) -i $< -o $@.tmp
cat header.html.template $@.tmp > $@
./mkfooter.sh >> "$@"
%.idx : %.meta meta2idx.sh
./meta2idx.sh "$<" > "$@"
index_center.fragment: $(IDXFILES)
echo "" > index_center.fragment
for f in $$(ls *.idx | sort -r) ; do cat "$$f" >> "index_center.fragment" ; done
index.html: header.html.template index_header.template index_center.fragment index_footer.template
printf '<title>The Blog of Charles Daniels</title>\\n' > "$@"
cat $^ >> "$@"
./mkfooter.sh >> "$@"
clean:
-rm *.html
-rm *.idx
-rm *.fragment
-rm *.tmp
meta2idx.sh
This script (included below) converts .meta
TSV metadata files into <li>
elements suitable for inclusion in the site index.
The only part of this script which is of particular interest is the bits to
extract the post title and date, which is accomplished by the command grep '^title.*' < "$META_FILE" | cut -f 2
(in this case, we are extracting the
title
field). This searches for a line that begins with the string literal
title
, then extracts the second field therefrom (keep in mind that cut
uses
\\t
as a delimiter by default).
For illustrative purposes, here is the output of meta2idx.sh
for this post’s
metadata file:
<li class="postlisting">
<!--POST_TITLE='Building a Static Site in 44 Lines of Make', POST_DATE='2017-11-22', POST_FILE='.//2017-11-22_make-static-site.html'-->
<span class="postdate">2017-11-22</span>
<h2><div class="postlink"><a href=".//2017-11-22_make-static-site.html">Building a Static Site in 44 Lines of Make</a></div></h2>
</li>
meta2idx.sh
#!/bin/sh
META_FILE="$1"
POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"
printf '\\t<li class="postlisting">\\n'
printf "\\t\\t<!--POST_TITLE='$POST_TITLE', POST_DATE='$POST_DATE', POST_FILE='$POST_FILE'-->\\n"
printf '\\t\\t<span class="postdate">%s</span>\\n' "$POST_DATE"
printf '\\t\\t<h2><div class="postlink"><a href="%s">%s</a></div></h2>\\n' "$POST_FILE" "$POST_TITLE"
printf "\\t</li>\\n\\n"
meta2title.sh
This script produces the title and post date, which are injected into each
post’s HTML file. This script also produces the HTML <title>
element for each
post as well. It is essentially the same thing as meta2idx.sh
, but with a
different HTML template.
For illustrative purposes, the output of meta2title.sh
for this post follows:
<div class="posttitle">
<h1 class=posttitletext>Building a Static Site in 44 Lines of Make</h1>
<p class=postdatetext><small>Posted 2017-11-22</small></p>
</div>
<title>Building a Static Site in 44 Lines of Make</title><!--Generated on Wed Nov 22 22:31:21 EST 2017 --!>
meta2title.sh
#!/bin/sh
META_FILE="$1"
POST_TITLE="$(grep '^title.*' < "$META_FILE" | cut -f 2)"
POST_DATE="$(grep '^date.*' < "$META_FILE" | cut -f 2)"
POSTS_ROOT="./"
POST_FILE="$POSTS_ROOT/$(basename "$1" .meta).html"
printf '<div class="posttitle">\\n'
printf '\\t<h1 class=posttitletext>%s</h1>\\n' "$POST_TITLE"
printf '\\t<p class=postdatetext><small>Posted %s</small></p>\\n' "$POST_DATE"
printf '</div>\\n'
printf '<title>%s</title>' "$POST_TITLE"
printf '<!--Generated on %s --!>' "$(date)"
mkfooter.sh
This is the simplest of the three scripts used to generate this site - it generates a copyright message with the current year, and includes the Creative Commons licenses I use for all of my posts. This script generates the same output every time it is run, which is included below for illustrative purposes.
<footer>
<div class="footertext">
<hr>
<small>Copyright 2017 Charles Daniels</small>
<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>
</div>
</footer>
mkfooter.sh
#!/bin/sh
CURRENT_YEAR=$(date +%Y)
printf "<footer>\\n"
printf '\\t<div class="footertext">\\n'
printf '\\t\\t<hr>\\n'
printf '\\t\\t<small>Copyright %s Charles Daniels</small>\\n' "$CURRENT_YEAR"
printf '\\t\\t<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</small></p>\\n'
printf '\\t</div>\\n'
printf '</footer>'
Conclusion
This methodology demonstrates that it is entirely tenable, and in fact relatively straightforward to build a simple static site without complex dependencies.
I would thoroughly recommend this approach to anyone looking to run a small blog or website where complex CMS tools, themes, or dynamic content are not required.
There are however, still a few sticking points that I will be working to clear up in the future, namely
- Monospaced “literal” code blocks (
<code>
and<pre>
for the HTML folks) do not wrap. Setting them to wrap could produce a result that cannot be safely copy-pasted without manually un-wrapping the lines, and looks weird besides. In the future, I would like to find a way to enable line wrapping and generate a visual indicator showing that the line has been wrapped, perhaps by inserting an 0x21AA (“RIGHTWARDS ARROW WITH HOOK”) symbol (↪) at the beginning of any wrapped lines. This may involve pre-processing the Markdown files before feeding them to Pandoc. - An RSS feed is not generated yet, which I believe Jekyll was handling for me previously. This can probably be fixed with an extra script to generate RSS entries from metadata files… a project for some future weekend.
- The index on the front page does not paginate. This is not an issue now, but as the quantity of posts on this site increases, this may become a problem. When this becomes a problem, I will fix it.