
After running Ansible playbooks directly from my workstation for a while, I decided it was time to get a proper AWX instance running in my homelab k3s cluster. This post covers the full journey — from Flux manifests to a custom Execution Environment with Cisco collections — including every gotcha I hit along the way.
The Stack
- k3s multi-node cluster (aarch64 NUC nodes)
- Flux v2.8.1 for GitOps
- Cilium CNI with Gateway API
- cert-manager for TLS
- Forgejo self-hosted git and container registry
- AWX 24.6.1 via the awx-operator Helm chart
Repository Structure
I use a base/overlay pattern in my GitOps repo. AWX lives under apps/:
apps/
├── base/
│ └── awx/
│ ├── namespace.yaml
│ ├── helmrepo.yaml
│ ├── helmrelease.yaml
│ └── kustomization.yaml
└── athena/ # cluster-specific overlay
└── awx/
├── awx.yaml # AWX CR (the actual instance)
├── certificate.yaml # cert-manager TLS
├── gateway.yaml # Cilium Gateway API
├── httproute.yaml # HTTPRoute
└── kustomization.yaml
Base Manifests
Namespace
# apps/base/awx/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: awx
HelmRepository
# apps/base/awx/helmrepo.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: awx-operator
namespace: awx
spec:
url: https://ansible-community.github.io/awx-operator-helm
interval: 24h
Gotcha #1: The correct Helm repository URL is
https://ansible-community.github.io/awx-operator-helm(theansible-communityGitHub org), nothttps://ansible.github.io/awx-operator/. The old URL no longer works.
HelmRelease
# apps/base/awx/helmrelease.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: awx-operator
namespace: awx
spec:
interval: 1h
chart:
spec:
chart: awx-operator
version: "3.2.1"
sourceRef:
kind: HelmRepository
name: awx-operator
namespace: awx
interval: 12h
Gotcha #2: The chart version (
3.2.1) is not the same as the AWX operator version (2.19.1). The Helm chart and the operator have separate versioning. Always check the chart releases for the correct chart version.
Kustomization
# apps/base/awx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- helmrepo.yaml
- helmrelease.yaml
Note the AWX CR (awx.yaml) is not in the base — it lives in the cluster overlay. This is intentional to avoid a race condition where Helm tries to create the AWX custom resource before the CRD is fully registered on first deploy.
Cluster Overlay (athena)
AWX Instance
# apps/athena/awx/awx.yaml
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
namespace: awx
spec:
service_type: ClusterIP
admin_user: admin
# Password auto-generated into secret: awx-admin-password
web_resource_requirements:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
task_resource_requirements:
limits:
cpu: 2000m
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
ee_resource_requirements:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
postgres_resource_requirements:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
postgres_storage_requirements:
requests:
storage: 8Gi
postgres_storage_class: local-path
projects_persistence: true
projects_storage_class: local-path
projects_storage_size: 8Gi
projects_storage_access_mode: ReadWriteOnce
control_plane_ee_image: quay.io/ansible/awx-ee:latest
web_replicas: 1
task_replicas: 1
TLS Certificate
# apps/athena/awx/certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: awx-uclab-tls
namespace: awx
spec:
secretName: awx-uclab-tls
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
dnsNames:
- awx.uclab.dev
Gateway and HTTPRoute
Since my cluster uses Cilium with the Gateway API instead of a traditional ingress controller, I expose AWX with a Gateway and HTTPRoute instead of an Ingress resource:
# apps/athena/awx/gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: awx
namespace: awx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
spec:
gatewayClassName: cilium
listeners:
- hostname: awx.uclab.dev
name: awx-uclab-dev-http
port: 80
protocol: HTTP
- hostname: awx.uclab.dev
name: awx-uclab-dev-https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: awx-uclab-tls
allowedRoutes:
namespaces:
from: All
# apps/athena/awx/httproute.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: awx
namespace: awx
spec:
hostnames:
- awx.uclab.dev
parentRefs:
- name: awx
rules:
- backendRefs:
- name: awx-service
port: 80
matches:
- path:
type: PathPrefix
value: /
Overlay Kustomization
# apps/athena/awx/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: awx
resources:
- ../../base/awx/
- awx.yaml
- certificate.yaml
- gateway.yaml
- httproute.yaml
Deploying and Watching the Rollout
Commit everything, push, and watch Flux do its thing:
flux reconcile kustomization apps --with-source
# Watch pods come up — takes 5-10 min
kubectl get pods -n awx -w
The startup sequence is:
awx-postgres-15-0starts firstawx-migration-*job runs DB migrationsawx-webcomes upawx-taskwaits inInit:0/3until migrations finish — this is normal, don’t panic
Once everything is Running, grab the admin password:
kubectl get secret awx-admin-password -n awx \
-o jsonpath="{.data.password}" | base64 --decode && echo
Custom Execution Environment with Cisco Collections
The default awx-ee image doesn’t include Cisco collections or ansible-pylibssh. Running a Cisco playbook out of the box gives you:
ERROR! couldn't resolve module/action 'cisco.ios.ios_vlans'
WARNING: ansible-pylibssh not installed, falling back to paramiko
The fix is to build a custom Execution Environment.
The Build Files
execution-environment.yml — this is the key file, and it took several attempts to get right:
version: 3
images:
base_image:
name: quay.io/ansible/creator-ee:latest # not awx-ee!
options:
package_manager_path: /usr/bin/microdnf # creator-ee uses microdnf, not dnf
dependencies:
ansible_core:
package_pip: ansible-core>=2.20.0
ansible_runner:
package_pip: ansible-runner
galaxy: requirements.yml
python: requirements.txt
system: bindep.txt
Gotcha #3: Use
quay.io/ansible/creator-eeas the base, notquay.io/ansible/awx-ee. Theawx-eebase triggers acheck_ansiblefailure in EE version 3 builds.creator-eeis the correct base for building custom EEs.
Gotcha #4:
creator-eeusesmicrodnf, notdnf. Withoutpackage_manager_path: /usr/bin/microdnf, system dependencies silently fail to install.
requirements.yml:
---
collections:
- name: cisco.ios
- name: cisco.nxos
- name: cisco.iosxr
- name: ansible.netcommon
requirements.txt:
ansible-pylibssh
bindep.txt:
gcc [compile]
python3-devel [compile]
libssh-devel [platform:rpm compile]
Building the Image
Since I’m on Apple Silicon (arm64) but my k3s nodes are amd64, I can’t use ansible-builder build directly — it doesn’t support --platform. The trick is to use ansible-builder create to generate the context, then hand off to docker buildx:
# Generate the Containerfile and build context
ansible-builder create
# Build for amd64 and push directly to Forgejo
docker buildx build \
--platform linux/amd64 \
--no-cache \
-f context/Containerfile \
-t forgejo.uclab.dev/affragak/awx-ee-cisco:latest \
--push \
context/
Gotcha #5:
ansible-buildergenerates aContainerfile, not aDockerfile. Pass-f context/Containerfileexplicitly todocker buildx.
Gotcha #6: If you just
docker pushafteransible-builder build, you’ll hitexec format errorin AWX because the image is arm64 but the nodes are amd64. Always build with--platform linux/amd64for amd64 clusters.
Registering the EE in AWX
-
Add a Container Registry credential (
Credentials → Add → Container Registry) pointing toforgejo.uclab.devwith your Forgejo username and password/token. -
Add the Execution Environment (
Administration → Execution Environments → Add):
| Field | Value |
|---|---|
| Name | awx-ee-cisco |
| Image | forgejo.uclab.dev/affragak/awx-ee-cisco:latest |
| Pull | Always |
| Registry Credential | your Forgejo credential |
- Assign to your Job Template (
Templates → Edit → Execution Environment → awx-ee-cisco).
Source Control with Forgejo SSH
For AWX to clone your playbook repos from Forgejo, create a Source Control credential with SSH:
# Generate a dedicated key pair
ssh-keygen -t ed25519 -C "awx@mycluster" -f ~/.ssh/awx_forgejo -N ""
# Get the Forgejo host key (to avoid host verification errors)
ssh-keyscan forgejo.uclab.dev
In Forgejo, add awx_forgejo.pub as a Deploy Key on the repo (Repo → Settings → Deploy Keys).
In AWX (Credentials → Add → Source Control):
- SCM Private Key: contents of
~/.ssh/awx_forgejo - SCM Host Key: output from
ssh-keyscan
Use the SSH URL format in your AWX Project: [email protected]:youruser/yourrepo.git
Gotcha Summary
| # | Problem | Fix |
|---|---|---|
| 1 | Wrong Helm repo URL | Use ansible-community.github.io/awx-operator-helm |
| 2 | Chart version ≠ operator version | Check chart releases separately |
| 3 | check_ansible failure in EE build |
Use creator-ee as base, not awx-ee |
| 4 | System deps fail silently | Add package_manager_path: /usr/bin/microdnf |
| 5 | docker buildx can’t find Dockerfile |
Use -f context/Containerfile |
| 6 | exec format error in AWX |
Build with --platform linux/amd64 on Apple Silicon |
| 7 | awx-task stuck in Init:0/3 |
Normal — it waits for DB migrations to finish |
| 8 | AWX CR not created on first deploy | Keep AWX CR in overlay, separate from HelmRelease |
AWX is now fully running at https://awx.uclab.dev, managing Cisco IOS, NX-OS and IOS-XR playbooks across my lab network. The custom EE approach is the right long-term solution — collections and dependencies are baked in, no re-downloading on every job run.