AWX in Practice

In my previous post, I got AWX running on k3s with a custom Execution Environment for Cisco collections. Time to actually use it for something practical: a self-service VLAN provisioning job that lets anyone on the team provision a VLAN on specific switches by filling in a form — no CLI, no SSH, no risk of typos in config mode.

This post covers building it end to end, including the surprising number of gotchas I hit with surveys, host targeting, and variable scoping.


The Goal

A one-click (well, one-form) workflow where the operator:

  1. Opens AWX
  2. Fills in three fields: VLAN ID, VLAN Name, which switches
  3. Hits Launch

AWX does the rest — validates, provisions, verifies — and logs who did what and when.


The Inventory

My master inventory in AWX has two groups:

switches
  ├── ios-switch-1
  └── ios-switch-2

routers
  ├── ios-router-1
  └── ios-router-2

The Playbook

# vlan-provision.yaml
---
- name: Provision VLAN on Cisco switches
  hosts: "{{ limit }}"
  gather_facts: false

  vars:
    vlan_id: "{{ survey_vlan_id | default(0) | int }}"
    vlan_name: "{{ survey_vlan_name | default('') }}"

  tasks:
    - name: Validate VLAN ID range
      ansible.builtin.assert:
        that:
          - vlan_id >= 2
          - vlan_id <= 4094
          - vlan_id not in [1002, 1003, 1004, 1005]
        fail_msg: "VLAN ID is invalid or reserved (must be 2-4094, excluding 1002-1005)"

    - name: Create VLAN
      cisco.ios.ios_vlans:
        config:
          - vlan_id: "{{ vlan_id }}"
            name: "{{ vlan_name }}"
            state: active
        state: merged

    - name: Verify VLAN was created
      cisco.ios.ios_vlans:
        state: gathered
      register: vlan_verify

    - name: Assert VLAN exists
      ansible.builtin.assert:
        that: >
          vlan_verify.gathered
          | selectattr('vlan_id', 'equalto', vlan_id)
          | list | length == 1
        fail_msg: "VLAN was not found after provisioning"

    - name: Show result
      ansible.builtin.debug:
        msg: "VLAN {{ vlan_id }} ({{ vlan_name }}) successfully provisioned on {{ inventory_hostname }}"

Two things worth noting here before we get to the gotchas:

  • hosts: "{{ limit }}" — the host targeting comes from the survey, not hardcoded. More on this below.
  • | default(0) and | default('') on the vars — required to prevent Jinja2 from crashing during lazy evaluation. More on this too.

AWX Setup

Job Template

Templates → Add → Job Template

Field Value
Name vlan-provisioning
Inventory master
Project your project
Playbook vlan-provision.yaml
Execution Environment awx-ee-cisco
Credentials your switch Machine credential
Limit (empty, prompt on launch: unchecked)

Survey

Templates → vlan-provisioning → Survey → Add

Question 1 — VLAN ID

Field Value
Question VLAN ID
Answer Variable Name survey_vlan_id
Answer Type Integer
Minimum 2
Maximum 4094
Required

Question 2 — VLAN Name

Field Value
Question VLAN Name
Answer Variable Name survey_vlan_name
Answer Type Text
Required

Question 3 — Which switches?

Field Value
Question Which switches?
Answer Variable Name limit
Answer Type Multiple Choice (single select)
Choices switches, ios-switch-1, ios-switch-2
Required

Enable the survey toggle → Save.


The Gotchas (there were many)

Gotcha 1 — hosts: can’t use arbitrary survey variables

My first instinct was:

hosts: "{{ target_switches }}"

With a survey variable target_switches. This fails immediately:

[ERROR]: Error processing keyword 'hosts': 'target_switches' is undefined

The hosts: field is evaluated during inventory parsing, before extra vars (including survey vars) are loaded. You can’t use arbitrary variable names here.

The fix: use the reserved name limit as your survey variable name. AWX passes survey variables as --extra-vars to ansible-playbook, and limit happens to be available early enough in evaluation to work in the hosts: field.

hosts: "{{ limit }}"

Gotcha 2 — Multi-select survey passes a list, not a string

I set up Question 3 as Multiple Choice (multiple select). The job ran but targeted all switches regardless of what was selected. The variables passed to the job showed why:

{
  "limit": ["ios-switch-1"]
}

A list ["ios-switch-1"] instead of a string "ios-switch-1". Ansible’s hosts: field expects a string — it silently ignored the list and fell through to the full group.

The fix: change Answer Type to Multiple Choice (single select). If you need to target multiple specific switches, add combined options to the choices:

switches
ios-switch-1
ios-switch-2
ios-switch-1,ios-switch-2

A comma-separated string is a valid Ansible host pattern.


Gotcha 3 — Hardcoded Limit in Job Template overrides the survey

After switching to single select, the job still hit all switches. The culprit: the Job Template had switches hardcoded in the Limit field. A hardcoded template Limit takes precedence over any survey variable — even one named limit.

The fix: clear the Limit field in the Job Template completely, and make sure Prompt on launch is unchecked. The survey drives everything.


Gotcha 4 — fail_msg with variable interpolation crashes on undefined vars

My original assert:

fail_msg: "VLAN ID {{ vlan_id }} is invalid or reserved"

This caused:

Error while resolving value for 'fail_msg': 'survey_vlan_id' is undefined

Even though vlan_id is defined in vars: as "{{ survey_vlan_id | int }}", Jinja2’s lazy evaluation means it tries to resolve survey_vlan_id at assert evaluation time — and if the survey var hasn’t propagated yet, it crashes.

Two fixes together:

First, add | default() to the var definitions so they never resolve to undefined:

vars:
  vlan_id: "{{ survey_vlan_id | default(0) | int }}"
  vlan_name: "{{ survey_vlan_name | default('') }}"

Second, remove the variable interpolation from fail_msg to break the evaluation chain:

fail_msg: "VLAN ID is invalid or reserved (must be 2-4094, excluding 1002-1005)"

The | default(0) also gives you a clean assertion failure (0 >= 2 evaluates to false) rather than an undefined variable crash — so the error message is actually useful.


Gotcha 5 — Job failing silently with rc=None

Early on, jobs were completing in ~4 seconds with no output at all. The awx-task logs showed:

job 28 (failed) encountered an error (rc=None)

rc=None means Ansible never ran — the Execution Environment container failed before the playbook started. The cause was an architecture mismatch: the EE image was built on Apple Silicon (linux/arm64) but the k3s nodes are amd64. The container silently failed to start.

The fix: always build EE images with --platform linux/amd64 on Apple Silicon:

docker buildx build \
  --platform linux/amd64 \
  -f context/Containerfile \
  -t forgejo.uclab.dev/affragak/awx-ee-cisco:latest \
  --push \
  context/

The Final Working Flow

After all of the above, the launch sequence is a clean single survey with no extra prompts:

┌─────────────────────────────────┐
│  Launch: vlan-provisioning      │
├─────────────────────────────────┤
│  VLAN ID?        [ 105        ] │
│  VLAN Name?      [ sales      ] │
│  Which switches? [ ios-switch-1 ]│
└─────────────────────────────────┘

And the output:

TASK [Validate VLAN ID range] ✓
TASK [Create VLAN] ✓
TASK [Verify VLAN was created] ✓
TASK [Assert VLAN exists] ✓
TASK [Show result] ✓
  "msg": "VLAN 105 (sales) successfully provisioned on ios-switch-1"

Gotcha Summary

# Problem Fix
1 Arbitrary variable in hosts: is undefined Use limit as the survey variable name
2 Multi-select passes a list, not a string Use single select; add combined choices for multi-target
3 Hardcoded Job Template Limit overrides survey Clear the Limit field in the template
4 fail_msg with vars crashes on undefined Add | default() to vars; remove vars from fail_msg
5 rc=None, job fails silently with no output Build EE with --platform linux/amd64 on Apple Silicon

What’s Next

With the pattern working, the same survey-driven approach applies to:

  • VLAN deletion — same structure, state: absent
  • Interface assignment — assign a port to a VLAN on a specific switch
  • Switch config backup — scheduled nightly, committed to Forgejo
  • Compliance audit — gather running config, diff against Git baseline

The foundation is solid. Each new playbook is just another Job Template pointing at a different file in the same repo.

my DevOps Odyssey

Logo

“Σα βγεις στον πηγαιμό για την Ιθάκη, να εύχεσαι να ‘ναι μακρύς ο δρόμος, γεμάτος περιπέτειες, γεμάτος γνώσεις.” - Kavafis’ Ithaka.



AWX in Practice: Self-Service VLAN Provisioning with Surveys

6 min read  ·  · views

2026-03-26

Series:lab

Categories:network-automation

Tags:#ansible, #awx, #network-automation, #lab


AWX in Practice: