Deploying AWX on kubernetes

awx

After running Ansible playbooks directly from my workstation for a while, I decided it was time to get a proper AWX instance running in my homelab k3s cluster. This post covers the full journey — from Flux manifests to a custom Execution Environment with Cisco collections — including every gotcha I hit along the way.

The Stack

k3s multi-node cluster (aarch64 NUC nodes)
Flux v2.8.1 for GitOps
Cilium CNI with Gateway API
cert-manager for TLS
Forgejo self-hosted git and container registry
AWX 24.6.1 via the awx-operator Helm chart

Repository Structure

I use a base/overlay pattern in my GitOps repo. AWX lives under apps/:

apps/
├── base/
│   └── awx/
│       ├── namespace.yaml
│       ├── helmrepo.yaml
│       ├── helmrelease.yaml
│       └── kustomization.yaml
└── athena/                   # cluster-specific overlay
    └── awx/
        ├── awx.yaml          # AWX CR (the actual instance)
        ├── certificate.yaml  # cert-manager TLS
        ├── gateway.yaml      # Cilium Gateway API
        ├── httproute.yaml    # HTTPRoute
        └── kustomization.yaml

Base Manifests

Namespace

# apps/base/awx/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: awx

HelmRepository

# apps/base/awx/helmrepo.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: awx-operator
  namespace: awx
spec:
  url: https://ansible-community.github.io/awx-operator-helm
  interval: 24h

Gotcha #1: The correct Helm repository URL is https://ansible-community.github.io/awx-operator-helm (the ansible-community GitHub org), not https://ansible.github.io/awx-operator/. The old URL no longer works.

HelmRelease

# apps/base/awx/helmrelease.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: awx-operator
  namespace: awx
spec:
  interval: 1h
  chart:
    spec:
      chart: awx-operator
      version: "3.2.1"
      sourceRef:
        kind: HelmRepository
        name: awx-operator
        namespace: awx
      interval: 12h

Gotcha #2: The chart version (3.2.1) is not the same as the AWX operator version (2.19.1). The Helm chart and the operator have separate versioning. Always check the chart releases for the correct chart version.

Kustomization

# apps/base/awx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - namespace.yaml
  - helmrepo.yaml
  - helmrelease.yaml

Note the AWX CR (awx.yaml) is not in the base — it lives in the cluster overlay. This is intentional to avoid a race condition where Helm tries to create the AWX custom resource before the CRD is fully registered on first deploy.

Cluster Overlay (athena)

AWX Instance

# apps/athena/awx/awx.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  service_type: ClusterIP

  admin_user: admin
  # Password auto-generated into secret: awx-admin-password

  web_resource_requirements:
    limits:
      cpu: 1000m
      memory: 2Gi
    requests:
      cpu: 200m
      memory: 512Mi

  task_resource_requirements:
    limits:
      cpu: 2000m
      memory: 2Gi
    requests:
      cpu: 200m
      memory: 512Mi

  ee_resource_requirements:
    limits:
      cpu: 500m
      memory: 512Mi
    requests:
      cpu: 100m
      memory: 128Mi

  postgres_resource_requirements:
    limits:
      cpu: 1000m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi

  postgres_storage_requirements:
    requests:
      storage: 8Gi

  postgres_storage_class: local-path

  projects_persistence: true
  projects_storage_class: local-path
  projects_storage_size: 8Gi
  projects_storage_access_mode: ReadWriteOnce

  control_plane_ee_image: quay.io/ansible/awx-ee:latest

  web_replicas: 1
  task_replicas: 1

TLS Certificate

# apps/athena/awx/certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: awx-uclab-tls
  namespace: awx
spec:
  secretName: awx-uclab-tls
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
    - awx.uclab.dev

Gateway and HTTPRoute

Since my cluster uses Cilium with the Gateway API instead of a traditional ingress controller, I expose AWX with a Gateway and HTTPRoute instead of an Ingress resource:

# apps/athena/awx/gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: awx
  namespace: awx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-production
spec:
  gatewayClassName: cilium
  listeners:
    - hostname: awx.uclab.dev
      name: awx-uclab-dev-http
      port: 80
      protocol: HTTP
    - hostname: awx.uclab.dev
      name: awx-uclab-dev-https
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - kind: Secret
            name: awx-uclab-tls
      allowedRoutes:
        namespaces:
          from: All

# apps/athena/awx/httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: awx
  namespace: awx
spec:
  hostnames:
    - awx.uclab.dev
  parentRefs:
    - name: awx
  rules:
    - backendRefs:
        - name: awx-service
          port: 80
      matches:
        - path:
            type: PathPrefix
            value: /

Overlay Kustomization

# apps/athena/awx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: awx
resources:
  - ../../base/awx/
  - awx.yaml
  - certificate.yaml
  - gateway.yaml
  - httproute.yaml

Deploying and Watching the Rollout

Commit everything, push, and watch Flux do its thing:

flux reconcile kustomization apps --with-source

# Watch pods come up — takes 5-10 min
kubectl get pods -n awx -w

The startup sequence is:

awx-postgres-15-0 starts first
awx-migration-* job runs DB migrations
awx-web comes up
awx-task waits in Init:0/3 until migrations finish — this is normal, don’t panic

Once everything is Running, grab the admin password:

kubectl get secret awx-admin-password -n awx \
  -o jsonpath="{.data.password}" | base64 --decode && echo

Custom Execution Environment with Cisco Collections

The default awx-ee image doesn’t include Cisco collections or ansible-pylibssh. Running a Cisco playbook out of the box gives you:

ERROR! couldn't resolve module/action 'cisco.ios.ios_vlans'
WARNING: ansible-pylibssh not installed, falling back to paramiko

The fix is to build a custom Execution Environment.

The Build Files

execution-environment.yml — this is the key file, and it took several attempts to get right:

version: 3
images:
  base_image:
    name: quay.io/ansible/creator-ee:latest  # not awx-ee!

options:
  package_manager_path: /usr/bin/microdnf   # creator-ee uses microdnf, not dnf

dependencies:
  ansible_core:
    package_pip: ansible-core>=2.20.0
  ansible_runner:
    package_pip: ansible-runner
  galaxy: requirements.yml
  python: requirements.txt
  system: bindep.txt

Gotcha #3: Use quay.io/ansible/creator-ee as the base, not quay.io/ansible/awx-ee. The awx-ee base triggers a check_ansible failure in EE version 3 builds. creator-ee is the correct base for building custom EEs.

Gotcha #4: creator-ee uses microdnf, not dnf. Without package_manager_path: /usr/bin/microdnf, system dependencies silently fail to install.

requirements.yml:

---
collections:
  - name: cisco.ios
  - name: cisco.nxos
  - name: cisco.iosxr
  - name: ansible.netcommon

requirements.txt:

ansible-pylibssh

bindep.txt:

gcc [compile]
python3-devel [compile]
libssh-devel [platform:rpm compile]

Building the Image

Since I’m on Apple Silicon (arm64) but my k3s nodes are amd64, I can’t use ansible-builder build directly — it doesn’t support --platform. The trick is to use ansible-builder create to generate the context, then hand off to docker buildx:

# Generate the Containerfile and build context
ansible-builder create

# Build for amd64 and push directly to Forgejo
docker buildx build \
  --platform linux/amd64 \
  --no-cache \
  -f context/Containerfile \
  -t forgejo.uclab.dev/affragak/awx-ee-cisco:latest \
  --push \
  context/

Gotcha #5: ansible-builder generates a Containerfile, not a Dockerfile. Pass -f context/Containerfile explicitly to docker buildx.

Gotcha #6: If you just docker push after ansible-builder build, you’ll hit exec format error in AWX because the image is arm64 but the nodes are amd64. Always build with --platform linux/amd64 for amd64 clusters.

Registering the EE in AWX

Add a Container Registry credential (Credentials → Add → Container Registry) pointing to forgejo.uclab.dev with your Forgejo username and password/token.
Add the Execution Environment (Administration → Execution Environments → Add):

Field	Value
Name	`awx-ee-cisco`
Image	`forgejo.uclab.dev/affragak/awx-ee-cisco:latest`
Pull	`Always`
Registry Credential	your Forgejo credential

Assign to your Job Template (Templates → Edit → Execution Environment → awx-ee-cisco).

Source Control with Forgejo SSH

For AWX to clone your playbook repos from Forgejo, create a Source Control credential with SSH:

# Generate a dedicated key pair
ssh-keygen -t ed25519 -C "awx@mycluster" -f ~/.ssh/awx_forgejo -N ""

# Get the Forgejo host key (to avoid host verification errors)
ssh-keyscan forgejo.uclab.dev

In Forgejo, add awx_forgejo.pub as a Deploy Key on the repo (Repo → Settings → Deploy Keys).

In AWX (Credentials → Add → Source Control):

SCM Private Key: contents of ~/.ssh/awx_forgejo
SCM Host Key: output from ssh-keyscan

Use the SSH URL format in your AWX Project: [email protected]:youruser/yourrepo.git

Gotcha Summary

#	Problem	Fix
1	Wrong Helm repo URL	Use `ansible-community.github.io/awx-operator-helm`
2	Chart version ≠ operator version	Check chart releases separately
3	`check_ansible` failure in EE build	Use `creator-ee` as base, not `awx-ee`
4	System deps fail silently	Add `package_manager_path: /usr/bin/microdnf`
5	`docker buildx` can’t find Dockerfile	Use `-f context/Containerfile`
6	`exec format error` in AWX	Build with `--platform linux/amd64` on Apple Silicon
7	`awx-task` stuck in `Init:0/3`	Normal — it waits for DB migrations to finish
8	AWX CR not created on first deploy	Keep AWX CR in overlay, separate from HelmRelease

AWX is now fully running at https://awx.uclab.dev, managing Cisco IOS, NX-OS and IOS-XR playbooks across my lab network. The custom EE approach is the right long-term solution — collections and dependencies are baked in, no re-downloading on every job run.

my DevOps Odyssey

“Σα βγεις στον πηγαιμό για την Ιθάκη, να εύχεσαι να ‘ναι μακρύς ο δρόμος, γεμάτος περιπέτειες, γεμάτος γνώσεις.” - Kavafis’ Ithaka.

Deploying AWX on k3s with Flux GitOps

6 min read · · views

2026-03-25

Series:lab

Categories:Kubernetes

Tags:#k3s, #flux, #cilium, #gitops, #lab

Deploying AWX on kubernetes: