R
G

Back to list2024-268

Hetzner base Proxmox Cluster on a Budget

In this massive guide I describe the process of installing a multi server Proxmox cluster on a budget. This example uses two dedicated root servers hosted by Hetzner in Europe. I shopped them at their server auction saving those pesky setup fees.

Regardless of what server you may choose this will also help you save a lot compared to most cloud computing offerings. Even if you shop cheap VPS it's hard to beat the cost/benefit ratio of simple hardware root servers.

Put a Proxmox cluster on to of it those and you might just have all the cloud computing infrastructure your small / medium business or lab needs.

If you shop for used Hetzner servers, check their HDD status right away (smartctl -a /dev/nvme0n1 and smartctl -a /dev/nvme1n1) Some of these disk have been hammered pretty hard and you may want to consider cancelling and find another offer.

Of course the example will mostly be the same for any other hosting provider or even your home data center. Just skip / translate the Hetzner specific bits.

Goals

The goal is to set up a resilient Proxmox cluster with two nodes. The nodes are connected through Wireguard allowing us to establish an open network between all hosts.

All public IPv4 traffic will be routed through the server's main IPv4 address. IPv4 subnets are a bit of a luxury these days and having two servers with two different addresses pretty much provides for my needs (e.g. mail server, public DNS, etc.).

So VMs in this example will only have internal IP addresses and use NAT and if needed a reverse proxy.

Baseline

  • Server 1 node1.clusterbuster.net:
    • Main IP: 142.45.178.134
    • IPv6 Subnet: 2607:f8b0:4005:809::/64
    • Internal VM Subnet: 10.51.0.0/16
    • VPN IP: 10.50.0.1/16
    • RAM: 128 GB
    • HDD: 2 x 1 TB SSD
  • Server 2 node2.clusterbuster.net:
    • Main IP: 65.123.211.90
    • IPv6 Subnet: fe80:abcd:1234:5678::/64
    • Internal VM Subnet: 10.52.0.0/16
    • VPN IP: 10.50.0.1/16
    • RAM: 128 GB
    • HDD 2 x 1TB SSD

It is assumed that all outside settings (e.g. DNS, etc) are set up at this point. It is highly recommended to use a real FQDN for each host in order to allow for easy integration with Let's Encrypt using certbot.

If you're unsure if your chosen hostnames are working you can check them using DNS Checker or use the DNS Lookup of MX Toolbox.

Hetzner OS Installation

Start your Hetzner servers in rescue mode for a script based installation of the OS. That way we have the required control over how to install the OS instead of having to make due with the default image configuration.

Once in rescue mode run installimage, select Debian (Official) and then choose the latest image. At the time of writing that was Debian-1205-bookworm-amd64-base.

What follows an instance of mcedit allowing you to make a few required changes:

  • Set the correct HOSTNAME (use the FQDN)
  • Comment the existing PART statements (should be 3 lines) and replace them with our custome partitioning:
PART swap swap 4G
PART /boot ext4 512M

PART lvm sys 100G
PART lvm pve all

LV sys root / ext4 all

This partitioning scheme reserved 4 GB of swap for the Host itself, a 512MB for the host's /boot and assigns the rest to two separate LVM groups.

The sys LVM group is used for the system itself, pve will be asigned to Proxmox. The sys group is then assigned to /root in it's entirety.

If you're able to host bulky files like iso files, disk images, etc. on an external storage /root can be much smaller saving valuable SSD disk space.

Once these changes are made save them (F2) and exit the editor (F10). Confirm the warnings and the install script should now happily do it's job and inform you once the installation is complete.

Reboot the system and log in again.

You may only be able to log in after deleting the host's entries in you ~/.ssh/known_hosts file as the key of the rescue system is no longer valid.

Once logged in set a new root password with passwd. You'll need this to log in to Proxmox.

Prepare servers

Welcome to your freshly set up server. Ahhh, that fresh server smell.

Before installing Proxmox make sure we have a solid foundation.

Optional I like to install a few quality of life tools. You should do the same right away for your preferred tools as it makes life just so much easier. For example apt update; apt -y install tmux neovim

SSH Keys

First we'll establish a password-less SSH connection between the servers. You may skip this process but it'll come in handy more often then not.

You may choose to generate a single SSH key for all servers but I'm going to create a set of keys for each server.

Generate a set of SSH keys on each server:

ssh-keygen -N '' -t ed25519 -C "root@`hostname -f`" -f ~/.ssh/id_`hostname -f`

Command explained:

  • -N '' sets an empty password. Remove this if you prefer setting a password for the key.
  • -t ed25519 forces the use of the Ed25519 algorithm over RSA which Debian seems to still use as a default.
  • -C just adds a comment. In our case that's root@FQDN
  • -f defines the file name. Instead of anonymous name I prefer someting like id_FQDN

As this example uses a non standard key name we're going to set the SSH config to use the key as it's default:

echo "Host *
    IdentityFile ~/.ssh/id_`hostname -f`" > ~/.ssh/config

Copy the content of each servers id_FQDN.pub file to ~/.ssh/authoirized_keys and try to log in to each server from the other one.

SSH improvements (optional)

SSH works fine out of the box. However these root servers are under constant attack so it's a good idea to tune your config for additional security.

echo "Port 63299
StrictModes yes
MaxAuthTries 2
PubkeyAuthentication yes
X11Forwarding no
TCPKeepAlive yes
LoginGraceTime 10s
PasswordAuthentication no
PermitEmptyPasswords no" > /etc/ssh/sshd_config.d/10_rg_custom.conf

systemctl restart sshd

Use with caution as this changes the SSH port to 63299 (or whatever you set). Update your ~/.ssh/config files to contain a Port for the host to keep things simple:

echo "Host node1
  Hostname node1.clusterbuster.net

Host node2
  Hostname node2.clusterbuster.net

Host node1.clusterbuster.net 142.45.178.134 node1
  Port 63299
  User root

Host node2.clusterbuster.net 65.123.211.90 node2
  Port 63299
  User root" >> ~/.ssh/config

Ideally you close SSH on the public IP completely allowing only users within the VPN (see below) to access it.

Network setup

Our guest systems will use a bridge device to connect to. This device is NATed to provide IPv4 internet access for all guests.

I've chose two different subnets for each server: 10.51.0.0/16 and 10.52.0.0/16 respectively. You can choose different networks but these are simple. Make sure to change the value of VM_BRIDGE_IP to reflect a correct value within your chosen subnet on each server.

# Install the bridge utils. They are not part of the default installation
apt -y install bridge-utils

# Set IP Forwarding
sed -i 's/#net.ipv4.ip_forward=1/net.ipv4.ip_forward=1/' /etc/sysctl.conf
sed -i 's/#net.ipv6.conf.all.forwarding=1/net.ipv6.conf.all.forwarding=1/' /etc/sysctl.conf

sysctl -p

sysctl net.ipv4.ip_forward
sysctl net.ipv6.conf.all.forwarding

####################################################################
# IP of the bridge itself. Should be within the subnet for your VMs
export VM_BRIDGE_IP=10.51.0.1/16

# Automatically get the name of the WAN device. Can be different (e.g. eno1, enp5s0). Set manually if your server has more then one!
export SRV_WAN_IF=`ls /sys/class/net/ | grep '^en'`

echo "
auto vmbr4
iface vmbr4 inet static
  address $VM_BRIDGE_IP
  bridge-ports none
  bridge-stp off
  bridge-fd 0
  post-up   iptables -t nat -A POSTROUTING -s '$VM_BRIDGE_IP' -o $SRV_WAN_IF -j MASQUERADE
  post-down iptables -t nat -D POSTROUTING -s '$VM_BRIDGE_IP' -o $SRV_WAN_IF -j MASQUERADE" >> /etc/network/interfaces

systemctl restart networking

Check your setup with ip a and you should see your bridge. Example:

...
4: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 6e:35:a5:bf:c1:88 brd ff:ff:ff:ff:ff:ff
    inet 10.51.0.1/16 brd 10.51.255.255 scope global vmbr4
       valid_lft forever preferred_lft forever

VPN set up with Wireguard

In this step we're setting up a Wireguard VPN connection between the two servers.

This step is optional but recommended. It'll give you a lot more flexibility setting up your firewall and allows for some nice features like all VMs connecting to each other regardless of which node they might live on.

First install wireguald and generate the keys required for wireguard. Run this on each server:

# Install wireguard
apt -y install wireguard

# Generate the key pair (the private key is obviously a secret worth keeping ;)
umask 077; wg genkey | tee /etc/wireguard/privatekey | wg pubkey > /etc/wireguard/publickey

# Print the public key to the terminal
cat /etc/wireguard/publickey

Now configure the VPN interface. This is done on each server with different values. Set the correct IPs for the server and change the correct subnet for each.

# The VPN IP of your server
export WG_IFACE_IP="10.50.0.1/16"

# Hostname of the Peer (used to name the device)
export WG_PEER_HOSTNAME="node2"

# The public IP of your peer. So on Server 1 use the IP of Server 2
# and vice versa
export WG_PEER_IP="65.123.211.90"

# The subnets you allow to connect to and through this server
# This is set to allow the VPN IP of the opposite server (10.50.0.2/32) 
# and the subnet used by the VM bridge on that peer (10.52.0.0/16) 
# allowing guest to connect to each other accross nodes. Change this
# depending on which subnet you configured on each server
export WG_ALLOW_IPS="10.50.0.2/32, 10.52.0.0/16"

# The public key of your peer (!). So this if you are on server 1
# put the public key of server 2 here and vice versa.
export WG_PEER_PUBKEY="KE+UQR4/08wiAhDAk4o6nJlA4m/T45OdiwWX6ymt4go="

# Wireguard Port:
export WG_PORT=51801


# The rest is just a template
echo "[Interface]
PrivateKey = `cat /etc/wireguard/privatekey`
Address = $WG_IFACE_IP             # Assign a private VPN IP for this server
ListenPort = $WG_PORT                 # Default WireGuard port

# Peer's configuration (Server 2)
[Peer]
PublicKey = $WG_PEER_PUBKEY
AllowedIPs = $WG_ALLOW_IPS
Endpoint = $WG_PEER_IP:$WG_PORT
PersistentKeepalive = 25" > /etc/wireguard/wg-$WG_PEER_HOSTNAME.conf

# Start the interface
wg-quick down wg-$WG_PEER_HOSTNAME
wg-quick up wg-$WG_PEER_HOSTNAME

# And test with a few pings
ping -i 0 -c 4 10.50.0.1
ping -i 0 -c 4 10.50.0.2

# Persist your changes
systemctl enable wg-quick@wg-$WG_PEER_HOSTNAME

As a result you should see a wg-HOSTNAME device on each server. The hostname is just cosmetic. You might as well use wg0 or whatever you like. I find this format quite handy when expanding to more peers in the VPN network.

Finally we're making sure that these servers translate their hostnames to their VPN IPs. This can be done with a proper DNS, a local dnsmasq instance or simply by modifying /etc/hosts which is good enough for our example:

echo "10.50.0.1   node1.clusterbuster.net node1
10.50.0.2   node2.clusterbuster.net node2" >> /etc/hosts

Consider using this procedure - with the appropriate changes - to connect your client system / network to the servers. That way you may lock these systems down completely (e.g. SSH, Proxmox UI, etc.) decreasing attack surfaces.

Firewall setup

In this step we're using ufw to set up the local firewall on the server.

This step isn't required to just make things work or if you use another type setup to firewall your system. In any case it's recommended to have some firewall setup protecting your system. Ideally, use VPN to block off all administrative access.

apt -y install ufw

# Allow your SSH connections
# >>> IMPORTANT: If you use another SSH port set it here <<<
ufw allow ssh

# Alternative SSH port (see SSH improvements above)
ufw allow 63299

# Allow HTTP/HTTPS
ufw allow 80/tcp
ufw allow 443/tcp

# Temporary allow Proxmox Management UI
# I stronly recommend to remove this and use a reverse
# proxy instead ... or better yet make this available through VPN only.
ufw allow 8006/tcp
ufw allow 5900:5999/tcp # NoVNC ports

# Allow VPN and inter-Guest traffic
# Ton increase security you might want to be a lot more
# specific. But for our experimental purposes this will do
ufw allow from 10.50.0.0/16
ufw allow from 10.51.0.0/16
ufw allow from 10.52.0.0/16

# Set IP forwarding for VM bridge
export VM_BRIDGE_IP=`ip -o -f inet addr show dev vmbr4 | awk '{print $4}'`
export SYS_NETWORK_IF=`ls /sys/class/net/ | grep '^en'`

ufw route allow in on vmbr4 from 10.51.0.1/16 to any
ufw allow out on $SYS_NETWORK_IF to any

# Allow traffic through VPN

ufw allow in on `ls /sys/class/net/ | grep wg-hc`
ufw allow out on `ls /sys/class/net/ | grep wg-hc`

# Allow routing through VPN (required so that VMs can talk
# to each other across the cluster.
ufw route allow in on `ls /sys/class/net/ | grep wg-hc`
ufw route allow out on `ls /sys/class/**net**/ | grep wg-hc`

echo "# NAT settings for vmbr4
*nat
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -s 10.$(get_hostname_bridge_subnet).0.0/16 -o $SYS_NETWORK_IF -j MASQUERADE
COMMIT
" >> /etc/ufw/before.rules

# Fix internet access for guests
ufw route allow in on vmbr4 out on $SYS_NETWORK_IF
ufw route allow out on $SYS_NETWORK_IF

# Allow routing between the bridge and wireguard so that guests can 
# communicate with one another
ufw route allow in on vmbr4 out on `ls /sys/class/net/ | grep wg-hc`
ufw route allow in on `ls /sys/class/net/ | grep wg-hc` out on vmbr4

# Setting some defaults
ufw default deny incoming
ufw default allow outgoing

# Enable with force (avoid interactive input)
ufw --force enable

In this example the Proxmox administration ports are open to the public. This is far from ideal and should only serve for testing purposes.

Later on in this tutorial we're going to set up an nginx reverse proxy to hide this interface. The reverse proxy is optional and may be set up in a different way all together (e.g. on a separate host). Hence I simply opened it for the purpose of demonstrating the setup. Please don't leave it like that. Bad idea.

Setting the firewall is always one of the more involved parts of any networking setup. The rules above might not work for you. Consider them a general guideline rather then hard fact. In fact I spent quite a bit of time refining this example to the needs of my final production environment which is close but slightly different.

Install Proxmox

Time for all these preparations to pay off. Installing Proxmox (see Hetzner guide) is straight forward:

curl -o /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg http://download.proxmox.com/debian/proxmox-release-bookworm.gpg
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list

apt update
apt -y full-upgrade
apt -y install ntp
apt -y install proxmox-ve

When asked to configure Postfix choose Internet Site. This will set it up to be in a secure default. Setting up Postfix is however not part of this tutorial. As for the hostname just accept the default.

Proxmox is more then just a few packages added on top of your favorite Linux distro. It deeply embeds itself bringing it's own kernel. It's also meant to be used on top of Debian or better yet installed through their own distro based on Debian.

Once the installation is complete reboot your servers. This is necessary to load the Proxmox kernel. Without that your Proxmox installation won't be too happy.

Welcome to Proxmox

After you reboot your servers check thier kernels: uname -a. The result should contain pve in it's version (e.g. 6.8.12-2-pve). This tells you that the Proxmox kernel is active.

You can reach the Proxmox UI on each of them. Pick one and go to Port 8006 to see it.

It's very likely that you'll encounter a certificate warning at this point. Don't worry about it at this stage. In our reverse proxy chapter this will be fixed using a Let's Encrypt certificate.

Log in as root with your system's root password. Yes ... I know, seems wrong but that's how Proxmox does it's thing for now. However you may choose to improve your setup. Check out the Proxmox Wiki on User Management for that.

Reverse Proxy

In this example we're going to install an nginx reverse proxy on each server. This will allow us to close port 8006, etc. and use nginx in various ways to restrict access to the UI.

This is optional and might not be the best way of doing it in a production setup. I'd recommend the reverse proxy to be on another host or a VM/container entirely. Also while this allows you to increase security this setup will still expose your Proxmox access on a public network. Again limit it to VPN networks if you can.

Run on at least one server you'd like to use as the cluster's main UI or on all of them:

# Install nginx and let's encrypt Certbot uitlities
apt -y install nginx certbot python3-certbot-nginx

# Get rid of the default host
rm /etc/nginx/sites-enabled/default

# Set the proxmox config:
echo 'server {
    listen 80;
    server_name '`hostname -f`';

    location / {
        proxy_pass https://localhost:8006;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for noVNC)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}' > /etc/nginx/sites-available/proxmox.conf

# Enable the site
cd /etc/nginx/sites-enabled/
ln -s ../sites-available/proxmox.conf

# Test your nginx setup (don't proceed if you got an error)
nginx -t

# Restart and persist
systemctl restart nginx
systemctl enable nginx

WARNING: You may be tempted to use this setup to access Proxmox at this point. Please don't. This connection is not encrypted. Your credentials won't be protected.

Go ahead with the certificate setup instead:

# Request new certificates for your host
# It is vital for the hostname to be valid (DNS) and available 
# publically on Port 80
certbot --nginx -d `hostname -f`

# At this point you might be queried to enter a few things. Please do.
# ...

# Next check your amended nginx config. Certbot makes changes to
# include certs, etc:
cat /etc/nginx/sites-available/proxmox.conf

The result should look something like this:

server {
    server_name node1.clusterbuster.net;

    location / {
        proxy_pass https://localhost:8006;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for noVNC)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/node1.clusterbuster.net/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/node1.clusterbuster.net/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = node1.clusterbuster.net) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    listen 80;
    server_name node1.clusterbuster.net;
    return 404; # managed by Certbot


}

Let's make sure those changes are good and take them live

# Test nginx config
nginx -t

# Go live :)
systemctl restart nginx

Now you can access your Proxmox UI through nginx: https://<FQDN>/

Use Let's Encrypt cert with Proxmox directly

Our reverse proxy uses a nice clean certificate but proxmox still uses a self signed one on Port 8006. This can be a problem later on when setting up the cluster.

As we already have those certificates just add them:

# Set the new cert on each host to be used by proxmox directly
pvenode cert set /etc/letsencrypt/live/`hostname -f`/fullchain.pem /etc/letsencrypt/live/`hostname -f`/privkey.pem

# Restart the proxmox UI
systemctl restart pveproxy

When you check Port 8006 now, the certificate error should be gone.

As these certificates will be renewed over time we also want to make sure that Proxmox is informed about those updates setting a deploy hook:

echo '#!/bin/bash
pvenode cert set --force 1 /etc/letsencrypt/live/`hostname -f`/fullchain.pem /etc/letsencrypt/live/`hostname -f`/privkey.pem
systemctl restart pveproxy' > /etc/letsencrypt/renewal-hooks/deploy/pve_cert_update.sh

chmod +x /etc/letsencrypt/renewal-hooks/deploy/pve_cert_update.sh

Disable public access to 8006

We left Port 8006 open for a while now. It's a good idea to close it:

ufw delete allow 8006/tcp

Additional protection through nginx

Leaving your Proxmox instance exposed is not a great idea. If you have to access it through a public network you can use nginx to add additional protections. Here's a very simple way of doing it with HTTP Basic auth:

# Install some CLI utils (not apache, don't worry)
apt -y install apache2-utils

# Create a new htpasswd file
htpasswd -c /etc/nginx/.htpasswd choose_a_username

# Optional: add more users -> DROP the `-c` it's deadly
htpasswd /etc/nginx/.htpasswd choose_another_username

Edit /etc/nginx/sites-available/proxmox.conf:

server {
    server_name node1.clusterbuster.net;

    location / {
        auth_basic "Don't touch my Proxmox!";
        auth_basic_user_file /etc/nginx/.htpasswd;

        proxy_pass https://localhost:8006;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (for noVNC)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/node1.clusterbuster.net/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/node1.clusterbuster.net/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = node1.clusterbuster.net) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    listen 80;
    server_name node1.clusterbuster.net;
    return 404; # managed by Certbot
}

The key changes here are at location / { adding the two auth lines.

Once done check and apply:

nginx -t
systemctl restart nginx

Now your proxmox setup is hidden away from prying eyes through the awesome power of HTTP Basic Auth ;)

Of course you can extend this in many ways but this covers the basics.

LVM-thin pool as Proxmox storage

Proxmox supports various storage backends. I encourage you to explore them and choose the one that most fits your needs.

This example will use LVM-thin as it's a massive improvement over just simple file system storage yet very easy to set up and most of all, self contained. We're on a budget here after all :)

Set up the pool:

# Creat the volume
lvcreate -l +99%FREE -n data pve

# Convert to LVM-thin
# This will issue a large warning. No worries, it's OK.
lvconvert --type thin-pool pve/data

Once done the easiest way to add the storage is through the UI. Once there click on Datacenter > Storage > Add and select LVM-Thin. In the dialog select the data volume in the pve volume group and give it a catchy name.

That's it. You can now use this storage volume with your guest systems.

DHCP server

We're going to use a local DHCP server on our bridge vmbr4 to assign IPs to our guests. This is not strictly required but it does make life easier.

apt -y install isc-dhcp-server

# Setting the interface
echo 'INTERFACES="vmbr4"' >> /etc/default/isc-dhcp-server

export VM_BRIDGE_NET=10.51.0.0
export VM_BRIDGE_NETMASK=255.255.0.0
export DHCP_START=10.51.0.100
export DHCP_STOP=10.51.10.254
export DHCP_GATEWAY=10.51.0.1
export DHCP_DNS_SERVERS="185.12.64.2, 185.12.64.1"

echo "subnet $VM_BRIDGE_NET netmask $VM_BRIDGE_NETMASK {
    range $DHCP_START $DHCP_STOP;
    option routers $DHCP_GATEWAY;
    option domain-name-servers $DHCP_DNS_SERVERS;
}" >> /etc/dhcp/dhcpd.conf

systemctl restart isc-dhcp-server.service

Set all values depending on the server or your requriements. I use 10.51.0.0/16 for server 1 and 10.52.0.0/16 for server 2.

Setting up Cluster mode

So far our two servers managed things on their own. Time to combine them into a single cluster. Thanks to all our hard work put into the preparations this will be a breeze.

At this point I'm assuming you've set up a VPN connction between the servers, your firewall is permissive on your VPN network and you've also set up proper certificates for your hosts. The latter part may be important as I ran into issues with hostname verification errors using nothing but the self signed certs.

# Set up your cluster on one of the servers. I prefer Server 1
# but it shouldn't matter.
# Your cluster name is limites to 15 chars
pvecm create clusterbuster

# Check the status
pvecm status

To add the second node (Server 2 in my case) I run this on the node I'd like to add:

# Add the node by adding it to your first machine
# Note: For this to work the API ports, etc. must be exposed.
# In our VPN setup /etc/hosts was modified to ensure local
# resolution to go through the VPN tunnel instead of the public IPs
pvecm add node1.clusterbuster.net

Adding a node asks for the root password again. This is the password of server 1 in this case. This is needed to authenticate against the API.

And that's all there's to it. Your second node should now show up in the dashboard.

DNS matters

In my experiments I came across issues with this setup not using the internal VPN connection but the public IPs. To fix this:

  • Set up your firewall ahead of time as this will cause unwanted connection to fail
  • Make sure your DNS entries for internal use are solid. I found that /etc/hosts might not be enough. You can do so by prioritizing your internal DNS infrastructure for the cluster or by using dnsmasq on your system.

Further steps

This gives you a good baseline cluster setup you could improve upon. Here's some recommendations of things not included in this already huge article:

  • Set up a strong DNS with a dynamic DNS zone the DHCP servers on each are able to update. This is especially useful when migrating guest systems around and you want them to keep their names regardless.
  • Set up external storage like NFS. This might require another server and of course depends on your requirements. Using network storage however makes migration easier.
  • Integrate your IPv6 subnet Hetzner gives you for free :)
  • Connect external clients to use Wireguard to connect to the Cluster. That way you can also make the reverse proxy respond to internal connections only.
  • Add nodes with other hosters in other countries.

These are just some of the ideas. With a solid cluster and networking setup you can easily and relatively cheaply run a self hosted cloud infrastructure across borders and hosters. There's no vendor lock in and you control your entire stack.

Dedicated root servers cost between 30 and 200 EUR a month easily beating most VPS offerings. Especially when you aim to have a lot of VMs or LXC containers, this option will really help and save you a lot of money. Plus you learn a thing or two in the process. Have fun! :)