Infrastructure

A cloud for a cool person (me)

NOTE: This section is under construction!

A 90s-looking "under construction" banner

The contents of this page are incomplete and subject to change. Check back later for a more complete version!


This project represents my unified efforts to manage all my software configurations and deployments across all my machines.

Configuration Management Tools

I use the following tools:

  • NixOS to manage machines at the bottom layer of my infrastructure, whether it’s a cloud-provided VPS or a bare metal machine at home.
  • Ansible to deploy changes to the astrid.tech backend. I intend to migrate away from it to a pure NixOS setup, however.
  • Cloudflare for DNS on my various domain names, and Terraform for configuring my DNS.

In the past, I used Kubernetes to deploy end-user services. I do intend on doing this again, but I still need to do research on how I can declaratively integrate the Kubernetes cluster into my setup.

Current Infrastructure Setup

Network Topology

I’m currently at college, but I’ve brought my homelab with me!

My apartment

My homelab at my apartment in SLO consists of a cascaded router setup. That is, we have a router for the rest of the house, and I have a router specifically for my room. Just like Texas with ERCOT! I sure hope I don’t end up like Texas with ERCOT…

The Texas power grid compared to the rest of US/Canada.

The reason I do this is so that I don’t accidentally break the rest of the LAN with my shenanigans. In other words, I expect that I’ll end up like Texas, but I’m trying to prevent the problems from reaching everyone else.

Planned segments

My homelab at my home home back in the bay is currently shut off. However, next time I get there, I’ll be setting up a Raspberry Pi jump server there so I can pretend my homelab is multi-site.

Additionally I plan on setting up a Wireguard VPN soon, with an Oracle Cloud VPS as the primary connection endpoint.

Personal Computers

BANANA

This is my usually-stay-at-home laptop with the following specs:

  • Hostname: banana.id.astrid.tech
  • Model: Lenovo Legion Y530-15ICH-1060
  • OS: Arch Linux/Windows 10 Dual Boot
  • CPU: Intel i5-8300H (8 core)
  • RAM: 32GiB
  • GPU: NVIDIA GeForce GTX 1060 Mobile
  • Monitors: 1920x1080, 2560x1440, 3840x2160

The dotfiles in ~astrid are managed using Nix home-manager.

Cracktop

Cracktop is my travel laptop that I bring to and from school. It was my old laptop from high school.

  • Hostname: cracktop-pc.id.astrid.tech
  • Model: HP Pavilion 13 x360
  • OS: NixOS Unstable
  • CPU: Intel i5-6300U (4 core)
  • RAM: 8GiB
  • Monitors: 1920x1080

There are a couple reasons why I use it despite its cracked screen:

  • It’s a lot lighter than BANANA, which reduces the load in my backpack.
  • It’s like a testing ground for managing the system over NixOS.
  • Campus has a bike theft problem, so I wouldn’t be surprised if it had a device theft problem as well. If I lose this machine, I won’t be too sad, and with the cracked screen, no one would want to steal it.

Workload Servers

Bongus

This server was an absolute steal I got off of eBay for $200.

  • Hostname: bongus-hv.id.astrid.tech
  • Model: HP ProLiant DL380P Gen8
  • OS: NixOS Unstable
  • CPU: 2x Intel Xeon (2x8 phys. core, 2x16 virt. core)
  • RAM: 96GiB

Unfortunately, it eats a lot of power, so I’m only turning it on sporadically when I need to heat my room.

Jump Servers

Jump servers are low-power SBCs that are always on, and can be used to send Wake-on-LAN packets to other machines. However, I intend to set up Wireguard VPN, so I can relegate these to a Wake-on-LAN-plus-other-stuff role. Currently, I have only one jump server, and that is jonathan-js, a Raspberry Pi 3.

Oracle Cloud

I have 2 Always Free VPSes in Oracle Cloud. I run the astrid.tech backend on one, and I’m planning on using the other as a VPN lighthouse.

History

This is the history of my attempts at system administration.

v0 - Early Forays

In late 2017, I was interested in trading Bitcoin. But not just going about it in a boring way; I wanted to algorithmically trade it, and use computers to calculate the optimal trading strategy. But I didn’t just want to calculate that strategy the normal way, either, I wanted to write a genetic algorithm to do it. And it didn’t just stop there, I wanted to run it on multiple computers to do parallel processing.

So, I spent $50 on a pair of Dell Optiplexes from eBay, hooked them into my LAN, installed Ubuntu Server on them, and tried to set up Apache Spark on them. What could possibly go wrong?

The issue is, at the time, I was more of a programmer, and I saw deployment as an implementation detail that could be sorted out later. So, I did it the very quick and dirty way (wget the zip, call it via SSH) without really learning about modern best practices.

Well, what ended up happening was I never got any of that working. I didn’t know about tools like Ansible, Docker, or Nix at the time, and deploying in the prod environment over SSH was every bit as frustrating and tedious as I thought it would be. Additionally, it turns out you can’t just make a modern algotrader only relying on technical indicators, so that failed too. However, this experiment did set the stage for my future DevOps endeavors.

v1 - VPS Deployment

In late December 2020, I was writing the backend for astrid.tech, and I came across the problem of “how do I deploy this?” And that’s when I learned Docker, and my VPS had excess capacity, so I hosted some other stuff besides my backend using Docker.

I consider this v1 of my homelab because it was something actually functional for a while. Although the service declarations were in a sort of modular Docker Compose architecture, they were all updated manually by SSHing in and essentially running git pull && docker-compose up. Here is the last version of the configs before I incorporated it into the rest of my monorepo.

v2 - On-Premises Cloud

Running a budgeting application like Firefly III in a public cloud alongside my website backend (a prime target for hacking!) seemed like a bit of a liability. So, I wanted to move that to a private, on-site cloud composed of multiple computers. It seemed awkward to manually allocate services to specific ones, so that led me to learn Kubernetes.

It was mostly set up manually, with a few Ansible scripts to semi-automate tasks like upgrading software, and a few Terraform configs to create databases. Here is what the infra repo looked like by the time I set up v2.

On-Site Hardware

Here is a listing of hardware I worked with:

NameModelArchProcessorCoresRAM (GB)Role
crappertopfn-1Acer Aspire E1-510amd64Pentium N352044Proxmox: k3s, nfs
cracktopfn-2HP Pavilion x360 13amd64i5-2520M48Proxmox: k3s
thonkpadfn-3Thinkpad T420amd64i5-6200U48Proxmox: k3s, db, nfs
zerg-1fn-4Raspberry Pi 3B+armBCM2837B041k3s
zerg-2fn-4Raspberry Pi 3BarmBCM283741k3s
zerg-3fn-4Raspberry Pi 2B+armBCM283641k3s
zerg-4fn-4Orange Pi Onearmsun8iw7p140.5k3s
Total---------2823.5Wasting my Time
Raspberry Pis involved in the cluster.
Hello from CyberCthulu

Public Cloud

I continued to use public cloud resources. I ran a Minecraft server on Contabo for a time, and I continued to run parts of v1 stack on oci-1.

NameProviderPrice ($/mo)ArchProcessorCoresRAM (GB)Role
contaboContabo7.90amd64Xeon something48Docker Compose
oci-1Oracle0amd64Xeon something11Docker Compose
oci-2Oracle0amd64Xeon something11Docker Compose
Total---7.90------68Wasting my Time

Infrastructure Services

These services were deployed on the on-site hardware.

NameDescriptionDeployed on
ProxmoxAn open-source Type 1 Hypervisor OSBare metal
K3sA lightweight Kubernetes distribution that won’t eat up most of the resources on a Raspberry Pi or craptopVM, bare metal Raspberry Pis
MySQL/MariaDBDatabaseLXC
PostgresDatabaseLXC
Docker ComposeMulti-container applications stack, useful for servers dedicated to a single purposeBare metal
NFSFile storage for specific Kubernetes servicesLXC
SambaFile and OS image storageLXC

End-User Services

These were the services I actually ran, and some I planned on running but never ended up getting done in v2.

NameStatusDescriptionDeployed on
OctoPrintDeployed3D Printer sender with web UIBare Metal
Firefly IIIDeployedBudgeting Appk3s
Printer Image SnapperDeployedPeriodically takes pictures of my 3D Printer and uploads them to the internetk3s
D&D DokuwikiDeployedA wiki for worldbuilding my D&D campaignDocker Compose
Trilium NotesDeployedPersonal wiki/note-taking for school and morek3s
Apache SparkPlannedBig Data processing enginek3s
DelugePlannedTorrenting serverk3s
Jupyter NotebookPlannedInteractive code notebooksk3s
BookstackPlanned“Internal” wiki for documenting this thingk3s
ELabFTWPlannedLab notesk3s
NextCloudPlannedPersonal cloudk3s

Of course, in Kubernetes, every service gets its own Ingress. However, these are internal services so it seems like somewhat of an antipattern to add A records pointing to 192.168.xxx.xxx in Cloudflare. My solution to this was just to add janky entries to my /etc/hosts and “fix it later” (in v2 of my homelab):

192.168.1.xxx firefly.astrid.tech grafana.astrid.tech prometheus.astrid.tech ...

Monitoring Services

And finally, these were the services I used to monitor all of the above.

NameTypeDescriptionDeployed on
Fluent-bitLogsReads logs from each node’s containersk3s, Docker Compose
FluentdLogsCentrally parses and processes logsk3s
LokiLogsStores and indexes logsk3s
PrometheusMetricsStores and indexes metricsk3s
GrafanaVisualizationGraphs and visualizes everything!k3s
Look at this graaaaaaaph, every time I look it makes me laugh

Reflection

This was my first foray into Kubernetes, and with a homelab, so despite all the learning curves, I think I did great here! However, there were a lot of things that I could have done better with this setup.

  • Ephemeral Kubernetes volumes. I didn’t have any centralized storage, or some kind of sane storage management, so whenever a Kubernetes pod died, it would lose all its data.
  • Mixed-architecture clustering is hard. You may notice I had both ARM and x86 machines. Some Docker images only support one architecture at a time. It’s very hard to do this. I do not recommend it.
  • Low-end machines could not support virtualization. It was a stupid idea to run Proxmox on badtop with its Pentium and 4GB RAM.
  • No domain controller. I wanted to set up FreeIPA, but I didn’t have the resources to do it.

v3 - Kubernetes-Focused Cloud

I decided to tear down my homelab and start anew, to fix all the issues I had with v2. This included reinstalling Proxmox, as well.

This time, I had a similar stack, but with a few critical differences, making my Kubernetes setup less painful.

On-Site Hardware

I dropped all the ARM machines. Mixed-architecture is too hard. Additionally, for some machines, I installed k3s on bare metal.

NameModelArchProcessorCoresRAM (GB)Role
badtopAcer Aspire E1-510amd64Pentium N352044Bare Metal: k3s
cracktopHP Pavilion x360 13amd64i5-2520M48Proxmox: k3s
thonkpadThinkpad T420amd64i5-6200U48Proxmox: k3s, FreeIPA
deskrapDell Optiplex 360amd64Intel Q????43Bare Metal: k3s

Infrastructure Services

Note that this time I actually managed DNS! This was done by having External DNS update FreeIPA’s server with Kubernetes Ingress entries. See this post for more information.

NameDescriptionDeployed on
ProxmoxAn open-source Type 1 Hypervisor OSBare metal
K3sA lightweight Kubernetes distributionVM, bare metal
FreeIPAAn all-in-one package for Identity, Policy, and AuditVM
LonghornA distributed storage solution for KubernetesKubernetes
KubeDBA Kubernetes operator that manages databasesKubernetes
External DNSAdds Kubernetes Ingress entries to Cloudflare and FreeIPA DNSKubernetes

FreeIPA managed the s.astrid.tech and p.astrid.tech namespaces, where s stands for service and p stands for private. I would register FreeIPA clients on the p namespace, and internal services on the s namespace (like longhorn.s.astrid.tech, firefly.s.astrid.tech, grafana.s.astrid.tech…)

End-User Services

NameStatusDescriptionDeployed on
OctoPrintDeployed3D Printer sender with web UIBare Metal
Firefly IIIDeployedBudgeting Appk3s
Printer Image SnapperDeployedPeriodically takes pictures of my 3D Printer and uploads them to the internetk3s
ELabFTWDeployedLab notesk3s
HomerDeployedHomepage linking together all of my servicesk3s
D&D DokuwikiDeployedA wiki for worldbuilding my D&D campaignk3s
Matrix HomeserverPlannedSelf-hosted chat appk3s
JellyfinPlannedMedia serverk3s
SambaPlannedA fileshare for all my files, it would connect to JellyfinLXC
DelugePlannedTorrenting serverk3s
Trilium NotesPlannedPersonal wiki/note-taking for school and morek3s
Jupyter NotebookPlannedInteractive code notebooksk3s
BookstackPlanned“Internal” wiki for documenting this thingk3s
NextCloudPlannedPersonal cloudk3s
Apache SparkPlannedBig Data processing enginek3s

Monitoring

I used the same exact stack as in v2 with minimal modifications.

Reflection

There were lots of good things about v3. Some of the good things were:

  • Kubernetes storage was a lot simpler now that I had Longhorn.
  • I ended up deploying more services and it was a lot nicer to deploy them.

However, there were some bad things as well:

  • You may notice that I dropped a lot of services. I planned to deploy them but never ended up doing it.
  • The FreeIPA VM ate a lot of RAM. I didn’t have as much capacity to deploy my services as a result.
  • Most of this was still manually set up. The Kubernetes cluster, for example, was manually set up using k3sup.
  • My Oracle Cloud VPS was still deploying using v1!

v4 - Attempts at fully-automated deployment

I tore down my homelab again. This time, I wanted to automate as much as possible. My primary goals were:

  • after installing the OSes on the bare metal machines, I can set all of them up by executing a single command
  • to create a more GitOps-like workflow where I can automatically deploy everything, simply by pushing my configs to main

Experiments with Ansible-based automated infrastructure bootstrapping

My first attempts revolved around a core idea: what if I had a central Ansible playbook that would do everything for me?

To reduce configuration drift, I also attempted to set up Packer. The rest of this section is coming soon…

NixOS!

I don’t exactly remember when I first found out about Nix, but it was sometime last year. It seemed like an interesting concept, but I didn’t use it as anything more than a package manager with unstable/bleeding-edge packages. At some point, I wanted to distrohop BANANA (it was on Ubuntu at the time). Arch Linux and NixOS were my top two candidates to hop to. Unfortunately, I had an extremely weird partitioning scheme involving Windows dual-boot, ext4, and a strange LVM topology, so I couldn’t figure out how to configure Nix to work with it at the time. Additionally, I didn’t want to spend much time learning the Nix language at that moment as I was lazy, and I was more interested in having a functionalfn-4 computer again. I ended up installing Arch, and it seems to mostly work!

However, while researching how to automatically bootstrap and update my cluster, I met Vika in the IndieWeb IRC. She told me about her Nix-based setup, and I realized that NixOS was perfect for what I was trying to do!

So, I turned my infra repo into a Nix flake, installed NixOS on Bongus, and that leads us to my current setup.


  1. Zerg rushing is essentially the “quantity-over-quality” tactic in strategy games: you send many expendable units against the enemy to hopefully overwhelm them.
  2. :thinking: https://emoji.gg/assets/emoji/thonk.png
  3. This is my old laptop. I pulled it out of my backpack one day and saw the screen completely cracked for no good reason whatsoever.
  4. My mom complained about it being really slow, and even with Linux, it’s still slow. Thus, it’s worse than crap.

Comments

Markdown formatting is supported.