Skip to main content

Build Your Own Kubernetes Co-Pilot: Harness AI for Reliable Cluster Management

· 8 min read

kubernetes-copilot

Have you ever felt frustrated by nonsensical AI outputs and hallucinations? If yes, this blog is going to be helpful for new or seasoned Kubernetes users who want to explore how AI can help manage Kubernetes resources more reliability.

What are AI hallucinations?

In a nutshell, AI hallucination occurs when a large language model (LLM) generates misleading or incorrect information in response to a prompt. This can happen due to various factors such as insufficient or flawed training data, overfitting, unrecognized idioms or slang, and adversarial inputs. These hallucinations manifest when the AI, aiming to produce coherent responses, makes errors that range from subtle factual inaccuracies to nonsensical or surreal outputs, similar to how humans might perceive patterns in random visuals.

In the context of Kubernetes, these aren't just minor nuisances; they can lead to significant operational blunders. In this blog, we explore how to enhance reliability of AI responses, mitigate the risks of hallucinations, manage Kubernetes resources using AI!

How can AI be helpful in managing Kubernetes resources?

Before we start exploring the technical setup, let's answer the question how can AI be helpful in managing Kubernetes resources? Imagine an AI assistant that can help you create, fix, and validate Kubernetes resources in a conversational manner. You might ask it to create a new deployment, fix a broken service, or validate a YAML file. If you are learning Kubernetes, this assistant can be a great learning tool to help you explore the cluster and clarify Kubernetes concepts.

Kubernetes helps manage cloud applications, but its YAML configurations can be tricky. When working with AI tooling, we've all faced those moments when AI tools, designed to ease this burden, instead contribute to it by generating nonsensical outputs; a phenomenon we refer to as "AI hallucinations".

Problem Statement

Let's state the issue we do have with AI in the context of Kubernetes:

  • 🤖 AI faces issues with consistency and reliability when dealing with large YAML files.
  • 🧠 AIs can have "hallucinations," generating illogical outputs that become more problematic as the input size increases.
  • 📈 This inconsistency makes working with AI models non-deterministic and error prone

Goals

Our main goal is to increase reliability and consistency in AI responses. We use two main techniques to achieve this:

  • 🛠️ Function calling to bind API routes as tools available for the AI Assistant to communicate with a Kubernetes cluster
  • 🔍 Internet search APIs to provide accurate and relevant information about Kubernetes

Implementation Plan

The following steps outline the plan to achieve our goals:

  • 💼 Use Flowise to implement the logic flow so that the AI Assistant can help with managing and troubleshooting a Kubernetes cluster on our behalf.
  • 🛠️ Create a simple Flask API that exposes functions for the AI Assistant to enable it to interact with the Kubernetes cluster.
  • 💻 Use function calling to bind the API routes as tools available for the AI Assistant which enables communication with a local Kind cluster with Kubernetes running.
  • 💬 Test the AI Assistant with various scenarios to ensure it can handle different Kubernetes configurations and provide accurate responses.

Assistant in Action

To follow along, you can clone the repository from GitHub, install prerequisites and follow the instructions.

Step 1: Setup the AI Assistant

In flowise create a new assistant. Notice that I'm using OpenAI's latest model, but for testing purposes you can select less powerful models or any open source model. The quality of responses will be affected, but it will still work.

Here are instructions that the assistand will follow:

You are a helpful Kubernetes Assistant specializing in helping build, fixing and validating various kubernetes resources yaml files.
Start by greeting the user and introducing yourself as a helpful and friendly Kubernetes Assistant.

If the user asks for help with creating or validating yaml files, do the following:

- if the files are correct proceed with the next steps, if no propose fixes and correct the file yourself
- if user asks for information about the kubernetes cluster use the get_config function and provide relevant information
- ask the user to submit one yaml file at a time or create one yaml file yourself if the user asks you to create one
- send the YAML content and only the YAML content to the create_yaml function
- immediately after use the tool cleanup_events to clean any old events
- ask the user if they would like to see the validation results and inform them that it takes some time for the resources to be installed on the cluster
- if the user responds yes, use the tool check_events to see if everything is correct
- if the validation passes, ask the user if they want to submit another YAML file
- if the validation fails, propose a new corrected YAML to the user and ask if the user would like to submit it for validation
- repeat the whole process with new YAML files

Your secondary function is to assist the user in finding information related to crossplane. Example categories:

- for questions about kubernetes concepts such as pods, deployments, secrets, etc, use brave search API on https://kubernetes.io/docs/concepts/
- for generic Kubernetes questions use brave search API on kubernetes docs: https://kubernetes.io/docs/home/
- for questions regarding kubernetes releases and features use brave search API on kubernetes releases documentaiton: https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG. If you are asked for details about specific release, select one of the releases, otherwise use latest stable release.

Step 2: Flask API

The server.py file defines API routes that wrap the kubectl commands.

ℹ️ The flask server is a naive implementation for demonstration purposes only. In real life scenario, we wouldn't call kubectl directly from the server but rather use a client library like kubernetes or client-go.

Step 3: Expose local URL to the internet

In order to enable the OpenAI assistant to use the functions we must expose the locally running flask server to the internet. For this a nice tool to use is ngrok. You can download it from here and follow the instructions to expose the local URL.

Step 4: Function calling

Now we can create functions for each API route. Those are:

  • get_config - returns the current Kubernetes configuration
  • create_yaml - creates a new Kubernetes resource from a YAML file
  • check_events - checks the status of the Kubernetes resources

For each of those routes we create a function that calls the API and returns the response. Here is how the function looks like in flowise:

function-in-flowise

Step 5: Use brave search API

The secondary function of our assistant is to assist the user in finding information related to Kubernetes. We can use the brave search API to achieve this

Step 6: Testing

Now since we have the whole flow available, let's test the assistant.

flow

Let's start by asking what is the cluster we are running on:

what-cluster

Here the assistant used the get_config function to get the current Kubernetes configuration and correctly identified the cluster.

Now let's ask the assistant to create a new nginx based ingress:

nginx-deployment

Notice how the assistant correctly selected the create_yaml function to create the ingress and then used the check_events function after asking if we would like to see the output. It's also interesting that it has found a different event that was not related to the nginx ingress and classified it as unrelated to our request.

Now, let's submit a broken deployment and see if the assistant can fix it:

broken-nginx

In this case we have submitted a broken deployment and the assistant has correctly identified the issue and even proposed a fix.

Lastly, let's check if the assistant can help us undrstand some Kubernetes concepts:

concepts-search

Here the assistant has used the brave search API to find information about the Kubernetes resource model and provided a link to the source.

Closing Thoughts

We have successfully demonstrated that using function calling and carefully crafted prompt instructions, we can increase the reliability and usefulness of AI assistants in managing Kubernetes resources. This approach can be further extended to other use cases and AI models.

Here are a few use cases where this approach can be useful:

  • 🤖 improved learning experience
  • 📈 help increase Kubernetes adoption
  • 🌐 virtual Kubernetes assistant

This guide demonstrates using function calling and carefully crafted prompt instructions to enhance the reliability and usefulness of AI assistants in Kubernetes management. These strategies can be extended to other use cases and AI models

Next Steps

Give it a try, build your own AI powered Kubernetes management today:

  • Clone the Repository: Visit GitHub to get the necessary files.
  • Set Up Your Assistant: Follow the instructions setup prerequisites and start building your Kubernetes Co-Pilot.
  • Engage with the Community: Share your experiences and solutions, there setup is very much proof of concept and can be improved in many ways.

Thanks for taking the time to read this post. I hope you found it interesting and informative.

🔗 Connect with me on LinkedIn

🌐 Visit my blogs on Medium

Crossplane resources in Neovim

· 2 min read

In the realm of text editors, Neovim stands out for its extensibility, especially for developers working with Kubernetes. The telescope-crossplane.nvim extension bridges the gap between Neovim's editing capabilities and Kubernetes resource management. This tutorial outlines the prerequisites, installation, and setup processes for integrating telescope-crossplane.nvim into Neovim, providing an efficient way to manage Kubernetes resources.

Prerequisites

Obviously some familiarity with Crossplane plus the following installed.

  • Neovim version 0.9.0 or higher.
  • The telescope.nvim plugin
  • kubectl

Installation

Installation of telescope-crossplane.nvim can be achieved through various plugin managers. A popular choice is packer.nvim. To install, include the following in the Neovim configuration file (init.lua):

use { "Piotr1215/telescope-crossplane.nvim",
requires = { { 'nvim-telescope/telescope.nvim' } },
config = function()
require("telescope").load_extension("telescope-crossplane")
end
}

Setup and Usage

Once installed, telescope-crossplane.nvim offers two commands that enhance Kubernetes management:

:Telescope telescope-crossplane crossplane_managed for managing Crossplane resources. :Telescope telescope-crossplane crossplane_resources for a broader view of Kubernetes resources.

These commands can be executed directly in Neovim, bringing Kubernetes resource management into the editor.

This integration significantly reduces context switching, as developers can view, edit, and manage Kubernetes resources without leaving their coding environment.

Benefits of Integration

Integrating telescope-crossplane.nvim with Neovim offers several advantages:

Streamlines Kubernetes workflows by bringing kubectl functionalities into Neovim. Enhances productivity by reducing the need to switch between terminal and editor. Offers a unified interface for code and Kubernetes resource management.

Conclusion

Neovim is a very extensible editor, lua is easy to learn and plugins not that difficult. It might be some learning at the beginning, but it’s well worth it.

The workflow with editing Crossplane resources (or any kubernetes resources for the matter) is a very common one. Deleting finalizes, adding/removing annotations etc. It’s all about staying in the flow and not leaving your main development environment.

Development with AI: the GAG Stack

· 5 min read

gag-stack

Introduction

Are developers going to be replaced by AI? What is the future of software development? Those questions are asked again and again as the software development landscape is evolving rapidly.

Viewpoints are polarized and generate heated debates and discussions. There is enough debate to fill a book, but in this article, I would like to explore practical applications of AI in software development. We are operating under the assumption that AI is here to stay and evolve, but at the end of the day, it is a tool that can be used to enhance our capabilities.

The GAG Stack

The GAG Stack is a bit of a tongue-in-cheek term that I came up with to describe a workflow that I have been experimenting with. It stands for GPT Pilot, Aider, and GitHub Copilot. These are three AI tools that exemplify well the stages of software development.

Communication and collaboration between people is at the heart of software development. For as long as this stays the case, AI tools will be used to help us model this process. This is how it could look like using the GAG Stack:

gag-stack-flow

We will still have to gather requiremetns, design, refine, test, retest, fix bugs, debug and deploy. The paradigm doesn't change much, the tools however do. The tools evolved to help us with the process.

Example Workflow

Let's take a look at how the GAG Stack could be used in practice. We will use a simple example of building a to-do list app.

Setup the environment

I'm using neovim and linux for my development workflow, yours might be different. Refer to the installation instructions for all the tools to setup on your machine.

For me the gtp-pilot runs via docker-compose, aider is installed via pip and GitHub Copilot as a neovim plugin.

Design and Refinement

We start by gathering requirements for our to-do list app. We want to have a simple app that allows us to add, remove and edit tasks. Let's start by providing this concept to GPT Pilot.

The Docker image has only node installed, so we are going to use it. It should be simple to add new tools to the image or use local setup. Here is initial prompt for a simple todo app:

gpt-pilot-prompt

The main value of this tool is the ability to refine and iterate on the desing. GPT Pilot will ask for specifications and generate an initial scaffolding:

architecture-questions

As a result of this back and forth, GTP Pilot generated app in a local folder (mounted via volume in docker-compose):

~/gpt-pilot-workspace/minimal-todo-app🔒 [ v16.15.1]
➜ tree -L 3 -I node_modules
.
├── app.js
├── package.json
└── package-lock.json

0 directories, 3 files

After a few iterations, we have a simple app running:

app-running with the following code:

// Require Express and Body-parser modules
const express = require("express");
const bodyParser = require("body-parser");

// Initialize a new Express application
const app = express();

// Configure the application to use Body-parser's JSON and urlencoded middleware
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));

// Start the server
const port = process.env.PORT || 3002;

app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});

Feature development

Now we can use Aider to help us with the development of the app. Aider is a development accelerator that can help with code modifications and features development.

Aider interface:

➜ aider
Aider v0.27.0
Model: gpt-4-1106-preview using udiff edit format
Git repo: .git with 4 files
Repo-map: using 1024 tokens
Use /help to see in-chat commands, run with --help to see cmd line args

Now we can generate feature for adding a new TODO item:

add-todo

We can keep iterating by adding new features and testing. For example:

iteration-features

Code Iteration

Finally, we can use GitHub Copilot to help us with the code iteration. GitHub Copilot is an autocompletion aid that can provide suggestions.

For example, here I want to log the GET request to the console, so I start typing:

autocompletion

And get autocomplete suggestions:

Conclusion

Obviously, the GAG stack is not the only set of tools, and the ones I've chosen might or might not have something to do with the resulting acronym. There is Devin, an open-source equivalent, Devina, that claims to be the first AI software engineer. There is Codeium, a free Copilot alternative. There are many other tools in this category, and the landscape is evolving rapidly.

Keen readers might have noticed that the underlying models used are OpenAI's GPT-3 and GPT-4. However, this is not a requirement. The tools can work with both local and remote models, paid and free. The choice of the model is up to the user.

So, are developers going to be replaced by AI? Are doomers or accelerationists right?

dommers-optimists

I think the answer is more nuanced. AI tools are here to stay, and they will be used to enhance our capabilities. The GAG stack is just one example of how AI can be utilized to assist us with software development.

As long as software development relies on human communication and creative collaboration, we will be talking about augmenting software development with AI rather than replacing it.

5 Common Pitfalls in Iac

· 6 min read

Image by Elchinator from Pixabay

5 common pitfalls in Infrastructure as Code

Introduction

Modern, cloud-native infrastructure can be created and destroyed within minutes. It can be scaled up and down depending on load and usage patterns.

GitOpsify Cloud Infrastructure with Crossplane and Flux

· 9 min read

In this article we are going to learn how to automate the provisioning of cloud resources via Crossplane and combine it with GitOps practices.

You will most benefit from this blog if you are a Platform or DevOps Engineer, Infrastructure Architect or Operations Specialist.

If you are new to GitOps, read more about it in my blog GitOps with Kubernetes

Let's set the stage by imagining following context. We are working as a part of a Platform Team in a large organization. Our goal is to help Development Teams to onboard get up to speed with using our Cloud Infrastructure. Here are a few base requirements:

  • Platform Team doesn't have resources to deal with every request individually, so there must be a very high degree of automation
  • Company policy is to adopt the principle of least privilege. We should expose the cloud resources only when needed with the lowest permissions necessary.
  • Developers are not interested in managing cloud, they should only consume cloud resources without even needing to login to a cloud console.
  • New Teams should get their own set of cloud resources when on-boarding to the Platform.
  • It should be easy to provision new cloud resources on demand.

Initial Architecture

The requirements lead us to an initial architecture proposal with following high level solution strategy.

  • create template repositories for various types of workloads (using Backstage Software Templates would be helpful)
  • once a new Team is on boarded and creates first repository from a template, it will trigger a CI pipeline and deploy common infrastructure components by adding the repository as Source to Flux infrastructure repo
  • once a Team wants to create more cloud infrastructure, they can place the Crossplane claim YAMLs in the designated folder in their repository
  • adjustments to this process are easily implemented using Crossplane Compositions

In real world scenario we would manage Crossplane also using Flux, but for demo purposes we are focusing only on the application level.

The developer experience should be similar to this:

TeamBootstrap

Tools and Implementation

Knowing the requirements and initial architecture, we can start selecting the tools. For our example, the tools we will use are Flux and Crossplane.

We are going to use Flux as a GitOps engine, but the same could be achieved with ArgoCD or Rancher Fleet.

Let's look at the architecture and use cases that both tools support.

Flux Architecture Overview

Flux exposes several components in the form of Kubernetes CRDs and controllers that help with expressing a workflow with GitOps model. Short description of 3 major components. All those components have their corresponding CRDs.

flux-architecture source: https://github.com/fluxcd/flux2

  1. Source Controller Main role is to provide standardized API to manage sources of the Kubernetes deployments; Git and Helm repositories.

    apiVersion: source.toolkit.fluxcd.io/v1beta1
    kind: GitRepository
    metadata:
    name: podinfo
    namespace: default
    spec:
    interval: 1m
    url: https://github.com/stefanprodan/podinfo
  2. Kustomize Controller This is a CD part of the workflow. Where source controllers specify sources for data, this controller specifies what artifacts to run from a repository.

    This controller can work with kustomization files, but also plain Kubernetes manifests

    apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
    kind: Kustomization
    metadata:
    name: webapp
    namespace: apps
    spec:
    interval: 5m
    path: "./deploy"
    sourceRef:
    kind: GitRepository
    name: webapp
    namespace: shared
  3. Helm Controller This operator helps managing Helm chart releases containing Kubernetes manifests and deploy them onto a cluster.

    apiVersion: helm.toolkit.fluxcd.io/v2beta1
    kind: HelmRelease
    metadata:
    name: backend
    namespace: default
    spec:
    interval: 5m
    chart:
    spec:
    chart: podinfo
    version: ">=4.0.0 <5.0.0"
    sourceRef:
    kind: HelmRepository
    name: podinfo
    namespace: default
    interval: 1m
    upgrade:
    remediation:
    remediateLastFailure: true
    test:
    enable: true
    values:
    service:
    grpcService: backend
    resources:
    requests:
    cpu: 100m
    memory: 64Mi

Crossplane Architecture Overview

Let’s look how the Crossplane component model looks like. A word of warning, if you are new to Kubernetes this might be overwhelming, but there is value in making an effort to understand it. The below diagram shows the Crossplane component model and its basic interactions.

Crossplane-architecture Source: Author based on Crossplane.io

Learn more about Crossplane in my blog "Infrastructure as Code: the next big shift is here"

Demo

If you want to follow along with the demo, clone this repository, it contains all the scripts to run the demo code.

Prerequisites

In this demo, we are going to show how to use Flux and Crossplane to provision an EC2 instance directly from a new GitHub repository. This simulates a new team on boarding to our Platform.

To follow along, you will need AWS CLI configured on your local machine.

Once you obtain credentials, configure default profile for AWS CLI following this tutorial.

Locally installed you will need:

  • Docker Desktop or other container run time
  • WSL2 if using Windows
  • kubectl

Run make in the root folder of the project, this will:

If you are running on on Mac, use make setup_mac instead of make.

  • Install kind (Kubernetes IN Docker) if not already installed
  • Create kind cluster called crossplane-cluster and swap context to it
  • Install crossplane using helm
  • Install crossplane CLI if not already installed
  • Install flux CLI if not already installed
  • Install AWS provider on the cluster
  • Create a temporary file with AWS credentials based on default CLI profile
  • Create a secret with AWS credentials in crossplane-system namespace
  • Configure AWS provider to use the secret for provisioning the infrastructure
  • Remove the temporary file with credentials so it's not accidentally checked in the repository

Following tools need to be installed manually

IMPORTANT: The demo code will create a small EC2 Instance in eu-centra-1 region. The instance and underlying infrastructure will be removed as part of the demo, but please make sure all the resources were successfully removed and in case of any disruptions in the demo flow, be ready to remove the resources manually.

Setup Flux Repository

  • create a new kind cluster with make, this will install Crossplane with AWS provider and configure secret to access selected AWS account

    Flux CLI was installed as put of the Makefile scrip, but optionally you can configure shell completion for the CLI . <(flux completion zsh)

    Refer to the Flux documentation page for more installation options

  • create access token in GitHub with full repo permissions. github-token
  • export variables for your GitHub user and the newly created token
    • export GITHUB_TOKEN=<token copied form GitHub>
    • export GITHUB_USER=<your user name>
  • use flux to bootstrap a new GitHub repository so flux can manage itself and underlying infrastructure

    Flux will look for GITHUB_USER and GITHUB_TOKEN variables and once found will create a private repository on GitHub where Flux infrastructure will be tracked.

    flux bootstrap github  \
--owner=${GITHUB_USER} \
--repository=flux-infra \
--path=clusters/crossplane-cluster \
--personal

Setup Crossplane EC2 Composition

Now we will install a Crossplane Composition that defines what cloud resources to crate when someone asks for EC2 claim.

  • setup Crossplane composition and definition for creating EC2 instances

    • kubectl crossplane install configuration piotrzan/crossplane-ec2-instance:v1
  • fork repository with the EC2 claims

    • gh repo fork https://github.com/Piotr1215/crossplane-ec2

      answer YES when prompted whether to clone the repository

Clone Flux Infra Repository

  • clone the flux infra repository created in your personal repos

    git clone git@github.com:${GITHUB_USER}/flux-infra.git

    cd flux-infra

Add Source

  • add source repository to tell Flux what to observe and synchronize

    Flux will register this repository and every 30 seconds check for changes.

  • execute below command in the flux-infra repository, it will add a Git Source

    flux create source git crossplane-demo \
    --url=https://github.com/${GITHUB_USER}/crossplane-ec2.git \
    --branch=master \
    --interval=30s \
    --export > clusters/crossplane-cluster/demo-source.yaml
  • the previous command created a file in clusters/crossplane-cluster sub folder, commit the file

    • git add .
    • git commit -m "Adding Source Repository"
    • git push
  • execute kubectl get gitrepositories.source.toolkit.fluxcd.io -A to see active Git Repositories sources in Flux

Create Flux Kustomization

  • setup watch on the AWS managed resources, for now there should be none

    watch kubectl get managed

  • create Flux Kustomization to watch for specific folder in the repository with the Crossplane EC2 claim

    flux create kustomization crossplane-demo \
    --target-namespace=default \
    --source=crossplane-demo \
    --path="./ec2-claim" \
    --prune=true \
    --interval=1m \
    --export > clusters/crossplane-cluster/crossplane-demo.yaml
    • git add .
    • git commit -m "Adding EC2 Instance"
    • git push
  • after a minute or so you should see a new EC2 Instance being synchronized with Crossplane and resources in AWS

new-instance-sync

Let's take a step back and make sure we understand all the resources and repositories used.

repos

The first repository we have created is what Flux uses to manage itself on the cluster as well as other repositories. In order to tell Flux about a repository with Crossplane EC2 claims, we have created a GitSource YAML file that points to HTTPS address of the repository with the EC2 claims.

The EC2 claims repository contains a folder where plain Kubernetes manifest files are located. In order to tell Flux what files to observe, we have created a Kustomization and linked it with GitSource via its name. Kustomization points to the folder containing K8s manifests.

Cleanup

  • to cleanup the EC2 Instance and underlying infrastructure, remove the claim-aws.yaml demo from the crossplane-ec2 repository
    • rm ec2-claim/claim-aws.yaml
    • git add .
    • git commit -m "EC2 instance removed"
  • after a commit or timer lapse Flux will synchronize and Crossplane will pick up removed artefact and delete cloud resources

    the ec2-claim folder must be present in the repo after the claim yaml is removed, otherwise Flux cannot reconcile

Manual Cleanup

In case you cannot use the repository, it's possible to cleanup the resources by deleting them from flux.

  • deleting Flux kustomization flux delete kustomization crossplane-demo will remove all the resources from the cluster and AWS
  • to cleanup the EC2 Instance and underlying infrastructure, remove the ec2 claim form the cluster kubectl delete VirtualMachineInstance sample-ec2

Cluster Cleanup

  • wait until watch kubectl get managed output doesn't contain any AWS resources
  • delete the cluster with make cleanup
  • optionally remove the flux-infra repository

Summary

GitOps with Flux or Argo CD and Crossplane offers a very powerful and flexible model for Platform builders. In this demo, we have focused on applications side with Kubernetes clusters deployed some other way, either with Crossplane or Fleet or Cluster API etc.

What we achieved on top of using Crossplane’s Resource Model is the fact that we do not interact with kubectl directly any longer to manage resources, but ratter delegate this activity to Flux. Crossplane still runs on the cluster and reconciles all resources. In other words, we've moved the API surface from kubectl to Git.

Infrastructure as Code - the next big shift is here

· 9 min read

Photo by Ben on Unsplash

Introduction

In this blog, we will look at the evolution of software infrastructure; provisioning, delivery and maintenance.

If you are interested in modern DevOps and SRE practices, this article is for you.

Infrastructure as Code (IaC) is a common pattern where virtualized infrastructure and auxiliary services can be managed using configuration expressed in almost any language, usually hosted in a source code repository.

Every once in a while software industry is shaped by significant events called Paradigm Shift. Here are a few such events that made Infrastructure as Code what it is today: