Skip to main content

21 posts tagged with "kubernetes"

View All Tags

Your AI Agent Should Read Your Notes Before Answering

· 11 min read

RAG Architecture

I have 395 notes in Obsidian and ~2,800 memories from past AI sessions. My AI agent knows none of it unless I paste it in manually. And I can only paste what I remember to paste — which defeats the point of having a knowledge store.

So I built a hook that fires before every prompt, searches a unified vector store, and injects the relevant context before the agent thinks. R2R as the RAG backend, pgvector for storage, Ollama for embeddings, all on a homelab Kubernetes cluster. Total latency: under 100ms. Zero tokens consumed in the retrieval path.

The value isn't the search itself. It's what surfaces when you stop choosing what's relevant. Ask about a Kubernetes pattern and a six-month-old Obsidian note appears. Debug a tmux issue and a memory from a previous session shows up with the exact fix. The agent answers from your accumulated knowledge, not just its training data.

Your Platform Already Has AI Users

· 7 min read

Your Platform Already Has AI Users

Platform engineering matured around one thesis: reduce cognitive load for developers. The CNCF Platforms White Paper codified the pattern. Internal developer portals, golden paths, self-service infrastructure — all designed so human teams ship faster without drowning in complexity.

AI agents now consume the same infrastructure. They clone repos, call CI/CD APIs, create pull requests, query issue trackers. But they don't read your Confluence wiki or file support tickets. They parse tool schemas and call structured APIs.

The bottlenecks shift. Humans hit cognitive overload: too many tools, tribal knowledge, context switching. Agents hit context window limits: too many tokens, irrelevant information, unstructured access. Same problem class, different constraint.

The CNCF platform engineering model transfers directly. Here's the mapping, dimension by dimension.

Give Your AI Agents a Brain That Survives a Reboot

· 12 min read

Agent Memory Architecture

AI agents learn things about you as you work. Your coding style, your infrastructure quirks, which tools you prefer, how you name things. That context accumulates across dozens of sessions. But it evaporates when the session ends or when you switch tools. Use Claude Code in the terminal, Cursor in the IDE, and a web-based MCP client on your phone. Three tools, three isolated memory silos. You can share static knowledge with files. But the context that accumulates from your interactions has nowhere to go.

What if the memory layer was separate from the tools entirely? Redis Agent Memory Server does exactly that. It's an open-source memory service that speaks MCP over HTTP. CLI tools (Claude Code, Codex, Gemini CLI), IDE tools (Cursor, Windsurf, Copilot), and any web-based MCP client all connect to the same store. Because the server runs over HTTP, you can expose it beyond the homelab with Cloudflare Tunnels or similar, so tools outside your network connect too. The memory lives on your infrastructure, not theirs.

The server provides two tiers: working memory (session-scoped context that auto-promotes to long-term) and long-term memory (persistent, semantically searchable). It exposes both a REST API and an MCP interface.

I deployed it on my homelab Kubernetes cluster. Four pods, one PVC, zero API costs. Here's how.

What If Your Docker Was Also Your Kubernetes?

· 5 min read

vind

Local Kubernetes development has rough edges. kind and minikube make cluster creation trivial, but common workflows need extra setup.

Exposing a LoadBalancer service means configuring MetalLB or tunneling through ngrok. Testing a new image means running kind load docker-image every iteration.

vind encapsulates these workflows. Kubernetes nodes run as Docker containers sharing the host daemon, so images pull straight from the local cache. A built-in LoadBalancer controller assigns real IPs without additional configuration.

Your Kubernetes Cluster is Unbalanced (And the Scheduler Won't Fix It)

· 6 min read

Descheduler

Photo by Paul Hanaoka on Unsplash

The Kubernetes scheduler is lazy. It places pods when they're created, picks the best node at that moment, and never thinks about it again. Weeks later, one node is at 75% memory while another sits at 40%. The scheduler doesn't care - its job was done the moment the pod started.

Descheduler fixes this. It runs every few minutes, finds imbalanced nodes, and evicts pods so the scheduler gets another chance to place them better. Set it up once, cluster self-balances automatically.

Here's how to configure it for a homelab. Takes about 20 minutes to get right.

MinIO is Dead. Garage S3 is Better Anyway (Homelab Migration Guide)

· 9 min read

Garage S3 Migration

Photo by todd kent on Unsplash

MinIO Community Edition is dead. On December 3, 2025, MinIO Inc. announced maintenance mode: no new features, no PR reviews, no Docker images, no RPM/DEB packages. Critical security fixes only "on a case-by-case basis."

This didn't come out of nowhere. Back in May 2025, they gutted the console - the GUI that made MinIO actually usable. What's left is a glorified file browser. User management, policies, replication config? Moved to the paid AIStor product. The whole thing is open source cosplay now - the repo exists, but it's just a funnel to their commercial offering.

The r/selfhosted and Hacker News threads are worth reading. Thousands of Helm charts and CI/CD pipelines depending on minio/minio images are now broken. Bitnami stopped their MinIO builds too.

Time to migrate. Honestly? For a homelab, Garage is the better choice anyway. 50MB footprint vs 500MB+. Written in Rust. Built-in static web hosting. Actively maintained. MinIO's collapse just forced me to make the switch I should have made earlier.

Here's how to set up Garage on Kubernetes. Takes about 15 minutes.

From Physical Servers to vCluster: Understanding Kubernetes Multi-Tenancy

· 14 min read

The computing abstraction ladder showing progression from physical hardware to virtual Kubernetes clusters

TL;DR: This blog explains the evolution of computing abstractions from physical servers to virtual Kubernetes clusters.

Each layer solved a real problem while creating new challenges: physical computers led to VMs (stranded resources), VMs to containers (OS overhead), containers to Kubernetes (orchestration complexity), and Kubernetes to virtual clusters (multi-tenancy isolation).

vCluster enables teams to run fully functional virtual Kubernetes clusters inside existing infrastructure—providing control plane isolation that namespaces cannot match.

Last week I returned from KCD UK where I led a workshop introducing people to vCluster (try it yourself here). At the workshop and at our booth, we fielded dozens of questions from people with wildly different backgrounds.

Some attendees had deep Kubernetes expertise but had never heard of virtual clusters. Others worked with containers daily and were exploring the orchestration layer. People came from different knowledge bases and experience levels.

The most common question? "What is vCluster, and why would I need it?"

The short answer: vCluster is an open-source solution that enables teams to run virtual Kubernetes clusters inside existing infrastructure. These virtual clusters are Certified Kubernetes Distributions that provide strong workload isolation while running as nested environments on top of another Kubernetes cluster.

But to understand why that matters—and whether you need it—you need to see how we got here.

Stop AI from Hallucinating Your Kubernetes YAML

· 8 min read

AI and Kubernetes Configuration

Building a Deterministic vCluster Validation MCP Server to Ground AI in Real Schemas

You ask an AI to generate a Kubernetes manifest, Helm chart values, or Ansible playbook. It responds instantly with clean, well-formatted YAML. You apply it. Nothing works.

This isn't a bug—it's AI hallucination. The AI knows YAML syntax but hallucinates config options that don't exist, mixes incompatible versions, or confidently suggests deprecated fields. It generates what looks right based on patterns, not what is right according to actual schemas.

How to Run Multiple GPU KAI Schedulers in Kubernetes Using vCluster

· 7 min read

Kubernetes Clusters

Photo by Growtika on Unsplash

In today's cloud-native landscape, GPU workloads are becoming increasingly critical. From training large language models to running inference APIs, organizations are investing heavily in GPU infrastructure. But with this investment comes a challenge: how do you safely test and deploy new GPU schedulers without risking your entire production environment?

Related talks: Watch my SREDay Paris Q4 2025 talk on this topic. Also presenting at Conf42 Kube Native 2025. Check the talks page for more details.