I decided to move from GitHub Pages to my own server with my own domain. Also I switched from commenting system based on GitHub issues to remark42 comment engine.

Moving to a new server

So the main website and the origin is now on https://decovar.dev/, and https://retifrav.github.io/ will continue to exist as a mirror on GitHub Pages.

What’s wrong with GitHub Pages

There is (almost) nothing wrong with GitHub Pages really. It is an awesome service for hosting static websites (such as blogs) free of charge.

But as my blog is getting bigger, I started to think that I might be “abusing” GitHub’s hospitality by hosting relatively heavy resources (such as videos) on the platform.

Also, looking at the overall SJW-hysteria, I am now a bit concerned that some content in my blog can “offend” somehow yet another snowflake, and I wouldn’t like one day to find my blog repository being blocked all of a sudden. I mean, this nonsense alone about changing default branch from master to main was a good enough sign that one should not rely on GitHub solely.

In addition to that, having one’s own server provides one with many capabilities, such as having a backend and/or an API, full control over the web-server, visitors analytics and so on.

New server and domain

Since I’m now on my own without GitHub infrastructure, I needed a domain and a web-server.

For the domain I chose decovar.dev, and .dev mandates the use of HTTPS. I already covered that in the previous post.

The web-server (NGINX-config) part is not very complicated, as not much is needed for serving a static website:

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name decovar.dev www.decovar.dev;

    ssl_certificate /path/to/certs/$server_name/fullchain.pem;
    ssl_certificate_key /path/to/certs/$server_name/key.pem;

    charset utf-8;

    root /var/www/decovar.dev;

    index index.html;

    error_page 401 /401.html;
    error_page 403 /403.html;
    error_page 404 /404.html;

    location /admin {
        try_files $uri $uri/ =404;
        auth_basic "Speak, friend, and enter";
        auth_basic_user_file /etc/nginx/.htpasswd;
    }

    location / {
        try_files $uri $uri/ =404;
    }
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name decovar.dev www.decovar.dev;
    return 301 https://$server_name$request_uri;
}

So I could set my own error pages for 401, 403 and 404 codes. The reason for not having those templated with Hugo is simply the fact the Hugo only supports 404.

I also protected /admin route with Basic authentication - to add password protection for GoAccess reports. They are not that secret, actually, so I might make them public later.

Another nice thing about having full control over the web-server is that you can set whatever HTTP headers and MIME types. In my case I wanted to set text/plain for my public PGP key (*.asc extension), and with NGINX this is done in /etc/nginx/mime.types:

...
text/plain txt asc;
...

To see the difference, try opening this and that link. By the way, Firefox complained about unknown encoding for a plain-text file, which is why I also needed to set charset utf-8; header in the website config.

GoAccess

Having full access to NGINX logs, it is now a grand time to ditch Google Analytics and to use GoAccess instead. I’ve already described this great visitors analytics tool, and here are some additions to that article.

My current host is provided by Oracle Cloud, and what I immediately noticed in NGINX access logs is lots of probing requests to all sorts of routes - that is from the very first day on a completely empty server with no content at all:

Probing requests in GoAccess report

So basically everything here is probes for environmental files and common CMS routes (with different HTTP methods too). And the list of such requests goes on and on for several pages.

None (except for / and /index.html) of these routes belong to my website, and like I said there was actually no website on the server. So my guess is that since Oracle Cloud (also AWS and Azure) and often rented by businesses, their IP-ranges are more likely to get probed like this than hosts from other providers. For instance, I also rent servers from a couple of other (smaller) providers, and none of them have this amount of probing requests on their hosts.

Naturally, such spamming completely messes up the analytics. The total number of unique visitors over the measured period (last two months) was 2918 - a value that normally I would be happy to see for my humble blog, but in this case this is obviously not a real number of actual visitors.

So what can be done here? There is a pretty simple solution - to blacklist these trash requests. For example, if we compose a text file requests-blacklisted.txt with something like this:

/vendor/phpunit/
/_ignition/
XDEBUG_SESSION_START
/solr/admin/
/index.php
/dnscfg.cgi

then here’s how this list can be used in the pipe of generating GoAccess report:

$ zcat -f /var/log/nginx/access.log* \
    | grep -v -f /path/to/requests-blacklisted.txt \
    | sudo -u www-data goaccess - -o /var/www/decovar.dev/admin/analytics.html

However, this way you are risking to accidentally blacklist some routes which you don’t have now, but might have in future, so an even better solution would be to first whitelist all the known routes of your website and then optionally blacklist some sub-routes.

In my case, here’s what I whitelisted in requests-whitelisted.txt:

GET \/projects(\/|$)
GET \/stuff(\/|$)
GET \/about(\/|$)
GET \/blog(\/|$)
GET \/etc(\/|$)
GET \/top(\/|$)

I don’t see much point in counting requests with methods other than GET, especially for a static website, so I explicitly whitelisted only requests with GET method.

Just in case, if you don’t understand why my rules are like this, take a look at any request in NGINX access log, such as this one:

36.233.31.141 - - [30/May/2021:05:39:47 +0000] "GET /blog/2017/09/29/virtual-box-on-mac-os/images/virtualbox-host-only-network-adapter.png HTTP/1.1" 200 210740 "https://decovar.dev/blog/2017/09/29/virtual-box-on-mac-os/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"

Admittedly, it’s not the best RegEx in the world for such purpose, but it seems to work.

I could’ve also added GET \/ HTTP and maybe also GET \/css\/ and GET \/js\/ rules, but they add pretty much useless noise in reports, so I haven’t.

Anyway, having added this list to the pipe, I almost completely eliminated trash requests from report. Almost - because there were still some, which did not belong to my website, so I modified the file with blacklisted requests like so:

\.(jar|jsp|php|asp|aspx|cgi|pl|cfm|htm|nsf)
GET \/blog\/skin\/?
GET \/blog\/bad397\/?
GET \/blog\/pivot\/?
GET \/blog\/guestbook_entries\/?
GET \/blog\/administrator\/?
GET \/blog\/core\/?
GET \/blog\/wp-links\/?
GET \/blog\/modules\/?
GET \/etc\/shadow\/?

And the updated pipe for generating GoAccess report now looks like this:

$ zcat -f /var/log/nginx/access.log* \
    | egrep -f /path/to/requests-whitelisted.txt \
    | egrep -v -f /path/to/requests-blacklisted.txt \
    | sudo -u www-data goaccess - -o /var/www/decovar.dev/admin/analytics.html

So first I apply whitelist and then blacklist. Also note the egrep here - this is to enable proper (extended) mode for regular expressions.

As a result, I now have a clean visitors analytics report:

Whitelisted requests in GoAccess report

And the number of unique visitors became 248, which is not as exciting, but is an actual meaningful value.

remark42 comments

Having moved out from GitHub, I can no longer use comments based on GitHub issues. At the very least because of CORS, but also because the whole point of moving to my own server was to become independent from 3rd-party infrastructure.

There are several commenting systems available for “in-house” hosting, but what really caught my eye was remark42. Right away I liked that:

  • it does not require a PostgreSQL/MySQL database
  • it is quite lightweight
  • it has good Markdown support
  • it supports anonymous users, OAuth and e-mail authentication
  • there is a voting system
  • there is RSS, e-mail and Telegram notifications
  • and there is even an API

So here’s how I added it to my website.

System user, config and systemd service

As usual, first you create a system user for it:

$ sudo adduser \
    --system \
    --group \
    --disabled-password \
    remark42

Then you get the binary and install it in the system:

$ cd /home/remark42/
$ wget https://github.com/umputun/remark42/releases/download/v1.8.1/remark42.linux-amd64.tar.gz
$ tar -xvf remark42.linux-amd64.tar.gz
$ sudo mv remark42.linux-amd64 /usr/local/bin/

Official documentation recommends to run it from Docker, but obviously we won’t do so, and that complicates things a bit. The setup without Docker is actually not that hard, but I had to refer to this article a couple of times, as some things were not entirely clear.

To run remark42 you need to pass several mandatory parameters to it. If you’d like to provide non-default paths for things like backup folder and avatars, that can be also done via parameters.

If you’ll pass those via CLI, that might look like the following:

$ /usr/local/bin/remark42.linux-amd64 server --secret="SOME-SECRET" --url="https://comments.decovar.dev" \
    --site=decovar.dev --auth.anon --edit-time=10m \
    --store.bolt.path="/home/remark42/data" --backup="/home/remark42/backup" \
    --avatar.fs.path="/home/remark42/avatars" --avatar.bolt.file="/home/remark42/avatars.db" \
    --avatar.uri="/home/remark42/avatars" --image.fs.path="/home/remark42/img/pictures" \
    --image.fs.staging="/home/remark42/img/pictures.staging" \
    --image.bolt.file="/home/remark42/img/pictures.db" \
    --web-root="/home/remark42/web" --port=8765

Not very convenient, is it. Fortunately, you can put all this into a config/environmental file:

$ sudo -u remark42 nano /home/remark42/remark42.conf
REMARK_URL=https://comments.decovar.dev
SECRET=SOME-SECRET
REMARK_PORT=8765
SITE=decovar.dev
AUTH_ANON=true
EDIT_TIME=10m
AUTH_GITHUB_CID=YOUR-AUTH-GITHUB-CID
AUTH_GITHUB_CSEC=YOUR-AUTH-GITHUB-CSEC
ADMIN_SHARED_ID=YOUR-ACCOUNT-ID
#ADMIN_PASSWD=SOME-PASSWORD

By that moment I also decided not to customize paths and to go with the default ones, especially that they will appear relatively to the working folder anyway.

Unfortunately, there is no option to bind remark42 only to localhost (if you’d like to run it behind reverse-proxy), so its port is exposed to the internet by default, and so you might want to add a blocking rule in your firewall for it. I have asked about this in remark42 repository, and what would you know, developers responded almost immediately and added this option to sources right away, so the next release should have it.

Having created the config, you can now add a systemd service like this:

$ sudo nano /etc/systemd/system/remark42.service
[Unit]
Description=remark42
After=network.target

[Service]
Type=simple
EnvironmentFile=/home/remark42/remark42.conf
WorkingDirectory=/home/remark42
ExecStart=/usr/local/bin/remark42.linux-amd64 server
Restart=always
RestartSec=10
SyslogIdentifier=remark42
User=remark42
Group=remark42

[Install]
WantedBy=multi-user.target

Don’t forget to enable and run it.

NGINX config and certificate

Of course, we’ll be running remark42 behind NGINX as a reverse-proxy:

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name comments.decovar.dev;

    ssl_certificate /path/to/certs/decovar.dev/fullchain.pem;
    ssl_certificate_key /path/to/certs/decovar.dev/key.pem;

    location / {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection keep-alive;
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://127.0.0.1:8765/;
    }
}

server {
    listen 80;
    listen [::]:80;
    server_name comments.decovar.dev;
    return 301 https://$server_name$request_uri;
}

So the service runs on a subdomain (which means you’ll need to declare it at your registrar), and the certificate is the same one as for the main domain. Just in case, here’s how you can issue and install Let’s Encrypt certificate for more than one domain:

$ acme.sh --issue -d decovar.dev -d www.decovar.dev -d comments.decovar.dev -w /var/www/decovar.dev
$ acme.sh --install-cert -d decovar.dev -d www.decovar.dev -d comments.decovar.dev \
    --key-file ~/certs/decovar.dev/key.pem  \
    --fullchain-file ~/certs/decovar.dev/fullchain.pem \
    --reloadcmd "sudo systemctl restart nginx.service"

To test if your remark42 setup is correct, documentation suggest to open /web/ route on your comments subdomain, such as https://comments.decovar.dev/web/, but for me this page is failing (yet another artifact of running without Docker), although this particular problem should be fixed soon. At the same time, comments on the main website are working fine.

Adding remark42 to Hugo

To run the remark42 scripts, I created a /layouts/partials/comments.html partial:

<script>
    var remark_config = {
        host: "{{ .Site.Params.remark42url }}",
        site_id: "{{ .Site.Params.domain }}",
        components: ["embed"],
        url: "{{ .Permalink }}",
        max_shown_comments: 25,
        theme: "light",
        page_title: "{{ .Title }}",
        locale: "en",
        show_email_subscription: false,
        simple_view: false
    };
</script>
<script>!function(e,n){for(var o=0;o<e.length;o++){var r=n.createElement("script"),c=".js",d=n.head||n.body;"noModule"in r?(r.type="module",c=".mjs"):r.async=!0,r.defer=!0,r.src=remark_config.host+"/web/"+e[o]+c,d.appendChild(r)}}(remark_config.components||["embed"],document);</script>

It takes some parameters from config.toml:

[params]
    domain = "decovar.dev"
    remark42url = "https://comments.decovar.dev"

And also it conveniently makes use of Hugo variables such as .Permalink and .Title.

Then I included this partial in the /layouts/_default/single.html template and also added a <div> for loading the actual comments:

{{ define "main" }}
    <article class="post">
        <div>
            <h1 class ="post-title">{{ .Title }}</h1>
        </div>
        <div class="post-content">
            {{ .Content }}
        </div>
    </article>
    {{ with .Params.comments }}
        <hr class="comments-divider"/>
        <div id="comments">
            <div id="remark42"></div>
        </div>
    {{ end }}
{{ end }}

{{ define "AddToBottom" }}
    {{ with .Params.comments }}
        {{ partial "comments.html" $ }}
    {{ end }}
{{ end }}

The AddToBottom block is defined in /layouts/_default/baseof.html:

        ...
        {{- block "AddToBottom" .}}{{- end }}
    </body>
</html>

This way comments scripts will be added only on pages that have comments parameter in the front matter, like so:

---
title: "About me"
date: 2014-07-30 10:35:42 +0300
comments: true
---

But for pages in the blog section I’d like to have comments enabled by default, without adding this parameter to the front matter, so I added the scripts partial unconditionally in the /layouts/blog/single.html:

    ...
    <hr class="comments-divider">
    <div id="comments">
        <div id="remark42"></div>
    </div>
{{ end }}

{{ define "AddToBottom" }}
    {{ partial "comments.html" $ }}
{{ end }}

Styling

At the moment remark42 commenting form and comments tree are added to the page via iframe, so setting a custom CSS would not be a very trivial task. One possible workaround for that is overwriting or rather redirecting requests to CSS files.

But actually I am totally fine with the “stock” styles for both comment form and comments tree. What I did want to customize though was the latest comments widget, and that one is not an iframe but a regular div, so here it cometh in comments.scss:

div.remark42__last-comments {
    margin: 30px 0 20px;

    article.comment {
        padding: 0;
        margin: 0 0 10px;

        div.comment__body {
            padding: 15px;
            border: 1px dashed $color-dimmed;
            font-family: $font-primary;

            .comment__title {
                margin-bottom: 5px;
                color: $color-primary;
                font-size: 1.2em;
                font-weight: bold;
            }
            .comment__title > a {
                color: inherit;
            }

            .comment__info::after {
                content: ": ";
            }

            .comment__text {
                font-style: italic;
            }
        }
    }
    article.comment:last-of-type {
        margin-bottom: 0;
    }
}

And here’re before/after screenshots:

Styling remark42 last comments widget

Import from Disqus

Another nice thing about remark42 is that it supports importing comments from Disqus. In the very beginning I was using exactly Disqus, before I switched to GitHub issue-based comments, and so it was nice to be able to move over at least the comments from Disqus.

To do so, go to Export page (but first try to find it yourself in the Disqus admin panel) and request an export. Some time later you will receive an e-mail with the link to your exported comments. Having the link, go to your host, download the archive, unpack and import to remark42:

$ cd /home/remark42/downloads
$ sudo -u remark42 wget https://media.disqus.com/uploads/exports/1/1/waspenterprises-2021-05-22T20:31:45.123456-all.xml.gz
$ sudo -u remark42 gunzip ./waspenterprises-2021-05-22T20:31:45.123456-all.xml.gz
$ sudo -u remark42 remark42.linux-amd64 import -p disqus \
    --secret SOME-SECRET --url https://comments.decovar.dev \
    --admin-passwd SOME-PASSWORD -s decovar.dev \
    -f ./downloads/waspenterprises-2021-05-22T20:31:45.123456-all.xml

For that to work you need to have ADMIN_PASSWD set in your remark42 config file. I didn’t see that in import/remapping examples which I found on the internet, but as I recall for me either of the commands was complaining about missing mandatory parameters such as --admin-passwd.

Now you need to remap the imported comments from the old page URLs to the new URLs on your domain. Create a file with mapping rules, like so:

$ sudo -u remark42 nano rules.txt
https://retifrav.github.io* https://decovar.dev*

And execute the remapping:

$ sudo -u remark42 remark42.linux-amd64 remap \
    --secret SOME-SECRET --url https://comments.decovar.dev \
    --admin-passwd SOME-PASSWORD -f ./rules.txt -s decovar.dev

Actually, it wasn’t too obvious to guess what rules exactly are expected here. I tried several combinations of rules and parameters for the remapping command, and eventually something worked. What exactly did work - I don’t know, because to see if it worked or not, you need to restart the service, and I figured that out only after some time, so maybe the very first attempt was already successful, and all further actions were redundant.

After the remapping all the old Disqus comments magically appeared under the posts on my new website. Shortly after I went to Disqus admin panel and deleted all the comments from the old website there.

Authentication

You can enable several authentication methods for users and you can also allow anonymous comments. I did the latter and also enabled GitHub authentication. Facebook and Twitter are for retards, so I won’t enable those, and others I will think about, perhaps I will at least enable Microsoft too.

E-mail authentication is also supported, and most likely I will enable that one too.

Speaking about anonymous comments, at the moment there is no pre-moderation functionality, although it might be added soon enough.

API

There is a REST API, and to use it for administration purposes you need to be logged-in via either of authentication methods and to have your user ID added to the ADMIN_SHARED_ID parameter in the config.

Then you’ll need to get a JWT. You can take it with Web Developer Tools from any request on a page with comments:

remark42 JWT in Web Developer Tools

Now you can perform administrator actions. For example, if you’d like to delete a comment with COMMENT-ID from https://decovar.dev/some/page.html:

$ curl \
    -X "DELETE" "https://comments.decovar.dev/api/v1/admin/comment/COMMENT-ID?site=decovar.dev&url=https:%2F%2Fdecovar.dev%2Fsome%2Fpage.html" \
    -H 'X-JWT: HERE-GOES-JWT'

Telegram notifications

You can enable Telegram notifications in a minute or so. Get a token from BotFather (or reuse one of your existing bots) and add the following to the remark42 config:

NOTIFY_TYPE=telegram
NOTIFY_TELEGRAM_CHAN=YOUR-ID
NOTIFY_TELEGRAM_TOKEN=YOUR-BOT-TOKEN

Despite documentation saying that NOTIFY_TELEGRAM_CHAN is for a channel ID, this value can actually be set to either of the following:

  • your own ID (numerical value, or perhaps @-name also works). In that case the bot will be sending notifications directly to you. It might be that you’ll need to send /start to the bot first, as bots cannot message users by default
  • channel ID. That can be either a private (numerical value) or a public (@-name) channel. Either way, the channel needs to be created first and the bot needs to be added to that channel as an admin
  • group chat ID (negative numerical value). The bot needs to be added to that group

Here’s an example of a notification sent directly to me:

remark42 Telegram notification

Canonical URL

Something what I was curious about for quite some time: how search engines such as Google handle the same content published on several websites and/or how can one specify which is the main website and which are just mirrors.

Apparently, this is what canonical url is (partly) for. So you just need to add it to <head> on your mirrors, pointing to the main website.

However, Google can still decide on its own, which page is the origin and which is a mirror, despite your explicit intention. Classic Google.

Either way, here’s what I did for the mirror on https://retifrav.github.io/. First I added the main website base URL as a parameter in config.toml:

[params]
    mainSite = "https://decovar.dev"

And then added canonical URL in /layouts/_default/baseof.html:

<link rel="canonical" href="{{ .Site.Params.mainSite }}{{ .RelPermalink }}" />

Plus I added this orange banner right under the page header block:

<div style="background-color:orange; padding:4px 16px; font-size:0.8em; text-align:center; font-style:italic;">
    On 19.05.2021 the website was moved to
    <a href="{{ .Site.Params.mainSite }}{{ .RelPermalink }}" style="color:black; font-weight:bold;">
        {{ .Site.Params.mainSite }}/</a>. This is just a mirror now.
</div>

Instead of adding canonical URL it might have been better to add a 301 redirect, but first of all I don’t have access to GitHub Pages web-server, and secondly it won’t hurt to have a mirror anyway.

It will be interesting to see a few months later, how does Google Analytics (which I kept enabled on the mirror) compare to GoAccess reports on the main server.

But most importantly, I’m very curious about which of the websites will be showing up in search results. Hopefully, canonical URL will play its role, and all the new search queries will lead to the main website rather than to the mirror.