About Talos, the kubernetes distro

If you have ever installed a Kubernetes cluster from scratch, you already know the burden that comes with the process. You need to deploy the VMs, configure them, secure the cluster as much as possible and tune the OS so it behaves less like a general purpose machine and more like a reliable home for Kubernetes.

If it is true (and it usually is, right?) that most of us use IaC tools nowadays, I often wondered whether there was a Linux distribution specifically optimized for running Kubernetes workloads, something that could at least save some time and maybe prevent me from accidentally summoning the debugging gods at 2 AM.

The answer is yes: Talos , the self defined Kubernetes Operating System.

Talos is described as a Linux distribution built for running Kubernetes. It offers several interesting advantages. It is secure by design because it includes only the bare minimum needed to run Kubernetes, it is immutable and it embraces an infrastructure as code approach that makes cluster state predictable and versionable.

And here is a great bonus: Talos is a Certified Kubernetes distribution.

What really makes it different from traditional Linux distributions is that the OS is managed exclusively through an API. This means there is no SSH access and no package manager to tinker with. Every cluster upgrade, OS upgrade, node addition or removal and maintenance activity is performed using the Talos API.

Does it sound overcomplicated at first sight? Maybe 😅 but Talos provides a very convenient CLI named talosctl, which helps a lot with day to day node management.

Let’s try Talos

The Docker way

The easiest and fastest way to get started with Talos is to run it in Docker. The process is very similar to what I explained in this blog post .

Prerequisites

Before we start playing with Talos, make sure you have:

To install the latest release of kubectl, run:

BASH
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
mkdir -p ~/.local/bin
mv ./kubectl ~/.local/bin/kubectl
# and then append (or prepend) ~/.local/bin to $PATH
kubectl version --client # this tests if kubectl binary is working
Click to expand and view more

To install the latest release of talosctl, run:

BASH
curl -sL https://talos.dev/install | sh
Click to expand and view more

At the end, you should have docker, kubectl and talosctl available and working. Now we can begin.

To create a cluster, run:

BASH
talosctl cluster create
Click to expand and view more

This command deploys a two node cluster with one control plane node and one worker. If you want to change the number of nodes, use the --workers and --masters options.

Be aware that this operation may take a few moments.

This is the expected output:

BASH
test@test:~$ talosctl cluster create
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/test/.talos/clusters/talos-default"
creating network talos-default
creating controlplane nodes
creating worker nodes
renamed talosconfig context "talos-default" -> "talos-default-1"
waiting for API
bootstrapping cluster
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: OK
waiting for apid to be ready: OK
waiting for all nodes memory sizes: OK
waiting for all nodes disk sizes: OK
waiting for no diagnostics: OK
waiting for kubelet to be healthy: OK
waiting for all nodes to finish boot sequence: OK
waiting for all k8s nodes to report: OK
waiting for all control plane static pods to be running: OK
waiting for all control plane components to be ready: OK
waiting for all k8s nodes to report ready: OK
waiting for kube-proxy to report ready: OK
waiting for coredns to report ready: OK
waiting for all k8s nodes to report schedulable: OK

merging kubeconfig into "/home/edivita/.kube/config"
PROVISIONER           docker
NAME                  talos-default
NETWORK NAME          talos-default
NETWORK CIDR          10.5.0.0/24
NETWORK GATEWAY       10.5.0.1
NETWORK MTU           1500
KUBERNETES ENDPOINT   https://127.0.0.1:43271

NODES:

NAME                            TYPE           IP         CPU    RAM      DISK
/talos-default-controlplane-1   controlplane   10.5.0.2   2.00   2.1 GB   -
/talos-default-worker-1         worker         10.5.0.3   2.00   2.1 GB   -
Click to expand and view more

From the output we can highlight a few useful details:

Now we can explore Talos, starting with its dashboard. The dashboard is read only and displays logs, OS status, Kubernetes version and the status of essential components for each node.

To access the dashboard of the control plane node:

BASH
talosctl dashboard --nodes 10.5.0.2
Click to expand and view more

You can repeat the same for any node.

Dashboard showing the control plane node information

Let us also verify that kubectl is working correctly:

BASH
test@test:~$ kubectl get nodes -o wide
NAME                           STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION      CONTAINER-RUNTIME
talos-default-controlplane-1   Ready    control-plane   30m   v1.33.6   10.5.0.2      <none>        Talos (v1.10.8)   6.14.0-35-generic   containerd://2.0.7
talos-default-worker-1         Ready    <none>          30m   v1.33.6   10.5.0.3      <none>        Talos (v1.10.8)   6.14.0-35-generic   containerd://2.0.7
Click to expand and view more

Everything looks good. Let us deploy a simple nginx workload to confirm that pods can start:

BASH
test@test:~$ kubectl create deployment frontend --image=nginxinc/nginx-unprivileged:trixie-perl --replicas=2
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false [...]
deployment.apps/frontend created

test@test:~$ kubectl get po -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP           NODE                     NOMINATED NODE   READINESS GATES
frontend-74bf6897cc-r9kx4   1/1     Running   0          14m   10.244.1.6   talos-default-worker-1   <none>           <none>
frontend-74bf6897cc-xvcpg   1/1     Running   0          14m   10.244.1.7   talos-default-worker-1   <none>           <none>
Click to expand and view more

Great, everything works.

But let us be honest. Running Talos in Docker is fun, but it is not really where Talos shines. It does not feel too different from tools like kind . Talos is designed for VMs and bare metal.

So let us destroy the cluster:

BASH
talosctl cluster destroy
Click to expand and view more

Time to unleash the true Talos potential!

The VM way

Now we can properly test Talos in a realistic scenario by deploying a VM. If you prefer, you can install the ISO on bare metal as well.

I downloaded the ISO onto my Proxmox server using this link:

https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/v1.11.5/metal-amd64.iso

You can check the latest supported versions here: https://docs.siderolabs.com/talos/v1.11/platform-specific-installations/virtualized-platforms/proxmox

Proxmox iso tab showing Talos iso file

I created a VM with 4 GB RAM, 1 socket, 4 cores and 15 GB disk space.

Proxmox VM hardware tab

When you start the VM, watch the console. The IP address will be printed automatically if DHCP is available.

Console output at first startup

At this point Talos is running from the live ISO. Before using it, we need to install it on disk, which requires:

Let us start with configuration.

Generate Talos configuration files

Using the IP address of the VM (in my case 192.168.30.165), run:

BASH
export CONTROL_PLANE_IP="192.168.30.165"
talosctl gen config talos-proxmox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir talos_pxe_cluster
Click to expand and view more

This generates three files: controlplane.yaml, worker.yaml and talosconfig. Since we have only one VM, we need only controlplane.yaml and talosconfig.

Install Talos

Before installing, verify what disk the VM uses so that Talos installs to the correct device:

BASH
talosctl get disks --insecure --nodes $CONTROL_PLANE_IP
Click to expand and view more

In my setup the disk is /dev/sda, which matches the default configuration in controlplane.yaml.

We can proceed with installation:

BASH
cp talos_pxe_cluster/talosconfig .talos/config
talosctl config endpoint $CONTROL_PLANE_IP
talosctl config node $CONTROL_PLANE_IP
talosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file talos_pxe_cluster/controlplane.yaml
Click to expand and view more

There is no output, so watch the VM console. The VM will reboot automatically. Once it comes back online, check the dashboard:

BASH
talosctl dashboard
Click to expand and view more

At some point you will see etcd listed as down. That is expected. When you see that, bootstrap the cluster:

BASH
talosctl bootstrap
Click to expand and view more

Monitor the dashboard until the STAGE field shows Running and READY field is True, as shown in the image below:

Talos Dashboard showing kubernetes is ready

Now talosctl is configured to target this node by default.

Next we need:

Generate kubeconfig file

This part is easy:

BASH
talosctl kubeconfig .kube/config_proxmox_cluster
Click to expand and view more

If you manage multiple clusters, you can make kubectl load all kubeconfig files automatically:

BASH
export KUBECONFIG=$(find $HOME/.kube/config* -type f | tr '\n' ':')
echo $KUBECONFIG
Click to expand and view more

Now let us check our cluster:

BASH
test@test:~$ kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
talos-nz5-coc   Ready    control-plane   16h   v1.33.6

test@test:~$ kubectl get po -A
NAMESPACE     NAME                                    READY   STATUS    RESTARTS      AGE
kube-system   coredns-78d87fb69b-mvtzd                1/1     Running   0             16h
kube-system   coredns-78d87fb69b-p2thg                1/1     Running   0             16h
kube-system   kube-apiserver-talos-nz5-coc            1/1     Running   0             16h
kube-system   kube-controller-manager-talos-nz5-coc   1/1     Running   2 (16h ago)   16h
kube-system   kube-flannel-v2275                      1/1     Running   0             16h
kube-system   kube-proxy-ljkr8                        1/1     Running   0             16h
kube-system   kube-scheduler-talos-nz5-coc            1/1     Running   2 (16h ago)   16h
Click to expand and view more

Allow workload on control plane node

You can do this with kubectl:

BASH
kubectl taint nodes <node-name> node-role.kubernetes.io/control-plane:NoSchedule-
Click to expand and view more

Or you can do it the Talos way. Talos gives two options:

For simplicity I used the interactive editor:

BASH
talosctl edit machineconfig -n $CONTROL_PLANE_IP
Click to expand and view more

Find and enable:

YAML
# # Allows running workload on control-plane nodes.
  allowSchedulingOnControlPlanes: true
Click to expand and view more

Save and close the editor. Talos will apply the changes immediately:

BASH
$ talosctl edit machineconfig -n $CONTROL_PLANE_IP
Applied configuration without a reboot
Click to expand and view more

Now deploy a test workload:

BASH
kubectl create deployment frontend --image=nginxinc/nginx-unprivileged:trixie-perl --replicas=2
Click to expand and view more

And check:

BASH
kubectl get po -o wide
Click to expand and view more

Perfect. At this stage we have a fully functional Talos VM running Kubernetes.

But I know you want more. So let us go deeper.

A more realistic scenario

In a real production environment, control plane and worker nodes are separate and control plane nodes do not run regular workloads. In this was I just decided to deploy 3 VMs, but they will be all control planes.

Here is the plan:

If you need guidance on VM setup, refer to the previous section . This time each VM should have two disks of 10 GB each. One for Talos and one for Longhorn.

Once the VMs are powered on and running the Talos ISO, we can proceed.

Create patch files

Create a directory:

BASH
mkdir talos-infra
cd talos-infra
Click to expand and view more

Cilium preparation patch

Name the file patch-cilium.yaml:

YAML
cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true
Click to expand and view more

If you want to customize pod or service CIDRs, you can extend it:

YAML
cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true
  podSubnets:
      - 10.244.0.0/16
  serviceSubnets:
      - 10.96.0.0/12
Click to expand and view more

DNS patch

patch-dns.yaml:

YAML
machine:
  network:
    nameservers:
      - 9.9.9.9
      - 1.1.1.1
Click to expand and view more

NTP patch

patch-ntp.yaml:

YAML
machine:
  time:
    servers:
      - 0.it.pool.ntp.org
      - 1.it.pool.ntp.org
      - 2.it.pool.ntp.org
Click to expand and view more

Allow workload on control plane

patch-allow-workflow-on-master.yaml:

YAML
cluster:
  allowSchedulingOnControlPlanes: true
Click to expand and view more

Longhorn dependencies patch

Create schematic.yaml:

YAML
customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/iscsi-tools
      - siderolabs/util-linux-tools
Click to expand and view more

Create userVolumeConfig.yaml:

YAML
apiVersion: v1alpha1
kind: UserVolumeConfig
name: longhorn
provisioning:
  diskSelector:
    match: disk.size > 10u * GiB
  grow: false
  minSize: 5GB
Click to expand and view more

Create patch-longhorn-extramount.yaml:

YAML
machine:
  kubelet:
    extraMounts:
      - destination: /var/mnt/longhorn
        type: bind
        source: /var/mnt/longhorn
        options:
          - bind
          - rshared
          - rw
Click to expand and view more

Network patches

We need three network patches, one for each node. Each node will receive a static IP and they will share a VIP.

I decided to reserve 4 IP addresses in my homlab which I will dedicate to:

patch-talos1-network.yaml:

YAML
machine:
  network:
    hostname: talos1
    interfaces:
      - interface: ens18
        dhcp: false
        addresses:
          - 192.168.30.165/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.30.1
        vip:
          ip: 192.168.30.168
Click to expand and view more

patch-talos2-network.yaml:

YAML
machine:
  network:
    hostname: talos2
    interfaces:
      - interface: ens18
        dhcp: false
        addresses:
          - 192.168.30.166/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.30.1
        vip:
          ip: 192.168.30.168
Click to expand and view more

patch-talos3-network.yaml:

YAML
machine:
  network:
    hostname: talos3
    interfaces:
      - interface: ens18
        dhcp: false
        addresses:
          - 192.168.30.167/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.30.1
        vip:
          ip: 192.168.30.168
Click to expand and view more

Install the VMs

Power on the VMs and take note of the temporary DHCP assigned IPs. You will use these to apply the configuration.

Check the disk layout:

BASH
export NODE1_TMP_IP="192.168.30.140" # assigned by DHCP
export NODE2_TMP_IP="192.168.30.155" # assigned by DHCP
export NODE3_TMP_IP="192.168.30.212" # assigned by DHCP

talosctl get disks --insecure --nodes $NODE1_TMP_IP
talosctl get disks --insecure --nodes $NODE2_TMP_IP
talosctl get disks --insecure --nodes $NODE3_TMP_IP
Click to expand and view more

Make sure the disk names match what you expect for Talos installation and Longhorn.

Generate the final config file

Export required variables:

BASH
export CLUSTER_NAME="proxmox-talos-cluster1"
export TALOS_VERSION="v1.10.8"

# these are the IPs we want to give to the VMs at the end
export NODE1_IP="192.168.30.165"
export NODE2_IP="192.168.30.166"
export NODE3_IP="192.168.30.167"
export K8S_VIP="192.168.30.168"
Click to expand and view more

At this point we need to talk about a file we created in the previous step: the schematic.yaml. Schematics in Talos act as an extension mechanism that allows you to add additional system tools to the Talos installation image. Since Talos nodes are immutable and do not include a package manager, schematics provide the supported way to bundle extra utilities into the OS at build time.

After creating the schematic.yaml file, you must upload it to the Talos Image Factory service. The factory processes the schematic and returns a unique image ID. You will then pass this ID to the configuration generation commands through a dedicated flag, so Talos can build and install a custom installer image that includes the extensions you defined.

Upload the schematic:

BASH
IMAGE_ID=$(curl -X POST --data-binary @schematic.yaml https://factory.talos.dev/schematics | jq -r ".id")
Click to expand and view more

Generate the base Talos config:

BASH
talosctl gen config $CLUSTER_NAME https://$K8S_VIP:6443 \
  --install-disk /dev/sda \
  --config-patch @patch-allow-workflow-on-master.yaml \
  --config-patch @patch-cilium.yaml \
  --config-patch @patch-dns.yaml \
  --config-patch @patch-longhorn-extramount.yaml \
  --config-patch @patch-ntp.yaml \
  --talos-version $TALOS_VERSION \
  --install-image factory.talos.dev/installer/$IMAGE_ID:$TALOS_VERSION \
  --output output_files
Click to expand and view more

Create per node config files:

BASH
talosctl machineconfig patch output_files/controlplane.yaml \
  --patch @patch-talos1-network.yaml \
  --output output_files/talos1.yaml

talosctl machineconfig patch output_files/controlplane.yaml \
  --patch @patch-talos2-network.yaml \
  --output output_files/talos2.yaml

talosctl machineconfig patch output_files/controlplane.yaml \
  --patch @patch-talos3-network.yaml \
  --output output_files/talos3.yaml
Click to expand and view more

Apply the configuration:

BASH
talosctl apply-config --insecure --nodes $NODE1_TMP_IP -f output_files/talos1.yaml
talosctl apply-config --insecure --nodes $NODE2_TMP_IP -f output_files/talos2.yaml
talosctl apply-config --insecure --nodes $NODE3_TMP_IP -f output_files/talos3.yaml
Click to expand and view more

Watch the VM consoles until they complete installation and reboot. When the first node reaches the point where you see a message like this:

BASH
[...] service "etcd" to be "up"
Click to expand and view more

this means we are ready to bootstrap the node:

BASH
cp output_files/talosconfig ~/.talos/config
talosctl bootstrap -n $NODE1_IP -e $NODE1_IP
Click to expand and view more

Check the dashboard:

BASH
talosctl dashboard -n $NODE1_IP -e $NODE1_IP
Click to expand and view more

Now generate a kubeconfig:

BASH
talosctl kubeconfig ~/.kube/config -n $NODE1_IP -e $NODE1_IP
kubectl get nodes
Click to expand and view more

The nodes will appear as NotReady because no CNI is installed yet.

Install Cilium

You need helm installed. See: https://helm.sh/docs/intro/install/

Then:

YAML
helm repo add cilium https://helm.cilium.io/ 
helm repo update
helm install \
    cilium \
    cilium/cilium \
    --version 1.18.4 \
    --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=true \
    --set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set cgroup.autoMount.enabled=false \
    --set cgroup.hostRoot=/sys/fs/cgroup \
    --set gatewayAPI.enabled=true \
    --set gatewayAPI.enableAlpn=true \
    --set gatewayAPI.enableAppProtocol=true \
    --set k8sServiceHost=localhost \
    --set k8sServicePort=7445
Click to expand and view more

Wait until all pods become ready:

BASH
kubectl get po -A -w
Click to expand and view more

Configure talosctl with VIP and node list:

BASH
talosctl config endpoint $K8S_VIP 
talosctl config nodes $NODE1_IP $NODE2_IP $NODE3_IP
Click to expand and view more

Now you can fully administer the cluster using talosctl. Check the reference doc for the tool to know more.

Install Longhorn

Create the namespace:

BASH
kubectl create ns longhorn-system && kubectl label namespace longhorn-system pod-security.kubernetes.io/enforce=privileged
Click to expand and view more

Install:

BASH
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.10.1/deploy/longhorn.yaml
Click to expand and view more

Monitor the pod status, until they’ll be all up and running:

BASH
kubectl get po -n longhorn-system -o wide -w
Click to expand and view more

Expose the UI to create volumes and manage your storage:

BASH
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
Click to expand and view more

Access in the browser at http://localhost:8080 .

Longhorn Dashboard

At this point we have successfully:

There are many more features to explore in Talos, but this post is already quite dense and I do not want to risk melting your brain. Talos is powerful, but it requires a shift in how you troubleshoot and operate clusters. The immutability model and lack of SSH access mean you need to embrace the API driven workflow fully. Once you do, you gain a secure, consistent and reproducible Kubernetes platform that fits perfectly into modern DevOps practices.

The smaller the attack surface, the harder it is for trouble to sneak in. That alone already brings peace of mind.

I hope you enjoyed this guide and that it helps you get started with Talos.

Happy testing and may your clusters always behave nicely, even on Mondays!

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut