How much work is it to migrate 20 years of content from WordPress to Hugo? (TL;DR; quite a lot of work.)
No, I didn’t spend quality time with my 8th grade science teacher1.
After I posted my previous post about static site generators, I began reading through the documentation for all the alternatives I’d outlined. Hugo seemed like the one that ticked off most boxes on my long list of requirements, so I decided to take a deeper dive into the world of Hugo.
The first thing to love about Hugo is that the static site generator is distributed as a singly binary file. There is no need to keep any libraries, programming languages or other dependencies up to date. I love that, because I’m continuously fighting furious battles in the dependency war at work, and I don’t want to do the same in my spare time as well.
It’s also important to me that it’s as easy as humanly possible to convert my current WordPress site to whatever static site generator I decide to use. In Hugo’s case, there are several ways of doing this, and after some wrestling with my PHP settings, I managed to export two point five gigabytes worth of content from WordPress. From there, getting the site up and running on Hugo with all of my 20 year’s worth of posts didn’t take more than the better part of an hour.
So now I have to do is hit the publish button, right? Unfortunately, it’s not that easy.
Content Clean Up
The first big pile of work I have to go through in the process of migrating to Hugo isn’t because of Hugo, but WordPress. The latter stores the content in HTML, and I want to have my content in Markdown, a lightweight markup language, instead. The main reason for this is that every serious static site generator supports Markdown, which makes it less painful if I want to migrate from one to another.
The plugin I used to convert my WordPress content Hugo was was able to convert the majority of the HTML to Markdown. But there are some HTML elements, like
figure, that has no logical equivalent in Markdown. A lot of my WordPress plugins also saved post metadata in WordPress custom fields, and this ended polluting the front matter.
Most of the stuff I have to clean up in the Markdown files can be solved with regular expressions. But properly organizing all the images will require a bit more tinkering. WordPress keeps images in a separate, date-based, folder structure, and Hugo prefers to have everything related to a Markdown file in the same folder. This means parsing through the image tags, finding the original image where WordPress stored it, moving it to the correct folder, and updating the image tag.
Although this will be a lot of work, it’s something that I (irrationally) look forward to doing. I imagine it’s a bit like vacuuming a really dusty floor, which always feels surprisingly satisfying.
But Is Markdown The Right Choice?
The challenge with Markdown is that it’s very, very basic compared to HTML. Markdown had its initial release in 2004, while HTML5 came hot off the W3C press in 2014. There has been some changes to the Markdown specification since 2004 through the CommonMark initiative, but there has been no groundbreaking changes.
There is, for instance, no support for footnotes in the Markdown specification2. The CommonMark community has been discussing whether or not footnotes should be added since 2014! Fortunately, I’m not the only one who wants to use footnotes, and a de-facto standard way to write them in Markdown has emerged. Many Markdown processors, including Goldmark, which Hugo uses out of the box, support footnotes.
It’s also possible to use Hugo’s shortcodes to get around Markdown’s limitations. But that means polluting my content with markup specific to the static site generator I’m using, which is something I want to try to avoid.
At the end of the day, it might not matter that Markdown is very basic. It could be that it forces me to write less complicated stuff, and really focus on the writing itself, and not so much on everything surrounding it. And if there is something I really, really want to do that I can’t do in Markdown, I can just write pure HTML in the Markdown files. Goldmark will simply just pass it through without any processing.
Hugo is amazingly powerful out of the box. But I’m beginning to think it might be too capable. There is a chance I’ll start to go overboard with Hugo’s features, like shortcodes, and end up entangling the content with proprietary Hugo stuff exactly the same way as I’ve done with WordPress.
No matter how I approach this, migrating my site from WordPress will be a mess. Despite that, it’s becoming more and more obvious to me that I want to replace WordPress with a static site generator. And it’s clear that it will be a lot of work, which will take time out of my capability to write stuff on this website.
So if you don’t hear much from me in the next months, you know what’s up. Think of it like the site closing up show for a while to do some long overdue renovations.