AI Devcontainer — Architecture & Setup

Overview

This devcontainer provides a self-contained, GPU-accelerated AI development environment running inside a DevPod workspace. It serves a full local LLM stack accessible publicly via a Cloudflare tunnel at https://ai.uclab.dev.

Internet
   │
   ▼
Cloudflare (ai.uclab.dev)
   │  Zero Trust Tunnel
   ▼
cloudflared (container)
   │  http://localhost:8080
   ▼
Open WebUI  ──────────────────►  Ollama API (localhost:11434)
(port 8080)                           │
                                       ▼
                                 GPU: RTX 5060 (8 GB VRAM)
                                 Models: llama3.1:8b, qwen2.5:7b,
                                         deepseek-coder-v2:16b-lite-instruct-q4_K_M,
                                         mistral-nemo

Host Requirements

Requirement	Details
Container runtime	Docker with nvidia-container-toolkit
GPU passthrough	CDI (`nvidia.com/gpu=all`)
DevPod workspace	Workspace ID: `ai`
Host directories	`~/.ollama` (model storage), `~/.cloudflared` (tunnel credentials)

File Structure

.devcontainer/
├── Dockerfile              # Image definition
├── devcontainer.json       # DevPod/VS Code container config
├── pull-models             # Script to pull Ollama models (idempotent)
├── supervisord.conf        # Reference config (not active — see note below)
├── supervisord/
│   ├── ollama.conf         # Supervisor program: Ollama
│   ├── open-webui.conf     # Supervisor program: Open WebUI
│   └── cloudflared.conf    # Supervisor program: Cloudflare tunnel
└── scripts/
    ├── setup               # Trusts and installs mise tools for workspace
    └── setup_project       # One-time project setup (commitizen, pre-commit)

Docker Image

Base: nvidia/cuda:12.8.0-runtime-ubuntu24.04

The image is built from .devcontainer/Dockerfile with the repo root as context.

Build stages (in order)

Step	What happens
Copy `mise`, `uv`	Pulled from upstream images (`jdxcode/mise`, `astral-sh/uv`)
`apt-get`	Installs: `sudo zstd libatomic1 supervisor nvtop git curl vim zsh`
Ollama	Installed system-wide via `https://ollama.com/install.sh`
cloudflared	Installed via Cloudflare’s official apt repo
open-webui	Installed system-wide via `uv pip install --system --break-system-packages`
Supervisor configs	`supervisord/*.conf` → `/etc/supervisor/conf.d/`
`pull-models` script	Copied to `/usr/local/bin/pull-models`, made executable
`ubuntu` sudoers	`ubuntu ALL=(ALL) NOPASSWD:ALL` — passwordless sudo
Switch to `ubuntu`	All subsequent steps run as non-root
mise tools	`mise.toml` baked in; Node LTS, Python 3.13, uv installed
Claude Code	`npm install -g @anthropic-ai/claude-code` via mise-managed Node
Shell activation	`mise activate` added to `.bashrc` and `.zshrc`

Developer tools (via mise)

Defined in /workspaces/ai/mise.toml:

[tools]
node = "lts"
python = "3.13"
uv = "latest"

Tools are installed into /home/ubuntu/.local/share/mise/ and shimmed onto PATH via /etc/profile.d/mise.sh.

Container Configuration (`devcontainer.json`)

GPU passthrough

"runArgs": [
  "--device", "nvidia.com/gpu=all",
  "--security-opt", "label=disable"
]

Requires nvidia-container-toolkit on the host with CDI configured. The label=disable flag is needed for SELinux compatibility.

Environment variables

Set at the container level (available to all processes):

Variable	Value	Purpose
`NVIDIA_VISIBLE_DEVICES`	`all`	Expose all GPUs
`NVIDIA_DRIVER_CAPABILITIES`	`all`	Full GPU capability set
`OLLAMA_KEEP_ALIVE`	`10m`	Keep model loaded in VRAM for 10 min after last use
`OLLAMA_MAX_LOADED_MODELS`	`1`	Max 1 model in VRAM at a time (8 GB card)
`OLLAMA_NUM_PARALLEL`	`1`	Single inference thread
`OLLAMA_BASE_URL`	`http://127.0.0.1:11434`	Used by Open WebUI to reach Ollama

API keys passed from the host:

Variable	Source
`ANTHROPIC_API_KEY`	`${localEnv:ANTHROPIC_API_KEY}`
`MISE_GITHUB_TOKEN`	`${localEnv:MISE_GITHUB_TOKEN}`

Persistent mounts

Both directories live on the host and are bind-mounted into the container, so they survive full container rebuilds:

Host path	Container path	Contents
`~/.ollama`	`/home/ubuntu/.ollama`	Downloaded model weights (~27 GB)
`~/.cloudflared`	`/home/ubuntu/.cloudflared`	Tunnel credentials and config

Port forwarding

Port	Service	Behaviour on attach
`8080`	Open WebUI	Opens in browser automatically
`11434`	Ollama API	Forwarded silently

Port 8080 is also bound on the host via -p 8080:8080 in runArgs.

Startup sequence (`postStartCommand`)

sudo supervisord -c /etc/supervisor/supervisord.conf && \
{ until curl -sf http://localhost:11434 >/dev/null 2>&1; do sleep 2; done && pull-models; } &

supervisord starts as a daemon — launches Ollama, Open WebUI, and cloudflared
A background job polls Ollama until it responds on port 11434
Once Ollama is ready, pull-models runs and downloads any missing models
postStartCommand returns immediately so the IDE is not blocked

Process Management (Supervisor)

The system supervisor (/etc/supervisor/supervisord.conf) is the system default Debian config. It includes all files from /etc/supervisor/conf.d/*.conf, which are baked into the image from .devcontainer/supervisord/.

All three programs run as user ubuntu with HOME="/home/ubuntu" explicitly set (supervisor does not inherit HOME when switching users).

Priority order

priority=1  ollama       ← starts first
priority=2  open-webui   ← waits 10s (startsecs) for Ollama to be ready
priority=3  cloudflared  ← tunnel up last

ollama

command=/usr/local/bin/ollama serve
user=ubuntu
environment=
    HOME="/home/ubuntu",
    OLLAMA_HOST="0.0.0.0:11434",
    OLLAMA_KEEP_ALIVE="10m",
    OLLAMA_MAX_LOADED_MODELS="1",
    OLLAMA_NUM_PARALLEL="1",
    NVIDIA_VISIBLE_DEVICES="all",
    NVIDIA_DRIVER_CAPABILITIES="all"

OLLAMA_HOST=0.0.0.0:11434 makes Ollama listen on all interfaces (required for DevPod port forwarding to work).

open-webui

command=/usr/local/bin/open-webui serve --host 0.0.0.0 --port 8080
user=ubuntu
environment=
    HOME="/home/ubuntu",
    DATA_DIR="/home/ubuntu/.local/share/open-webui",
    OLLAMA_BASE_URL="http://127.0.0.1:11434",
    WEBUI_AUTH="true",
    ENABLE_MEMORY="true",
    MEMORY_RETRIEVAL_TOP_K="5",
    RAG_EMBEDDING_ENGINE="ollama",
    RAG_EMBEDDING_MODEL="nomic-embed-text",
    ENABLE_RAG_WEB_SEARCH="true",
    RAG_WEB_SEARCH_ENGINE="duckduckgo",
    DO_NOT_TRACK="true",
    ANONYMIZED_TELEMETRY="false"

DATA_DIR must point to a writable path — open-webui was installed as root so its package directory is not writable by ubuntu.

RAG uses nomic-embed-text via Ollama for embeddings. Web search uses DuckDuckGo (no API key required).

cloudflared

command=/usr/local/bin/cloudflared tunnel run 2571f532-7790-4275-8bb4-33e0c20c64d6
user=ubuntu
environment=HOME="/home/ubuntu"

The tunnel ID and credentials are static. Configuration is read from the bind-mounted /home/ubuntu/.cloudflared/config.yml:

tunnel: 2571f532-7790-4275-8bb4-33e0c20c64d6
credentials-file: /home/ubuntu/.cloudflared/2571f532-7790-4275-8bb4-33e0c20c64d6.json

ingress:
  - service: http://localhost:8080

All traffic on ai.uclab.dev is forwarded to Open WebUI on port 8080.

Model Management (`pull-models`)

Location in image: /usr/local/bin/pull-models

The script is idempotent — it checks ollama list before pulling and skips models already present. Since ~/.ollama is a persistent host mount, models are only downloaded once regardless of how many times the container is rebuilt.

Configured models

Model	Size	Purpose
`llama3.1:8b`	4.9 GB	General-purpose chat
`qwen2.5:7b`	4.7 GB	General-purpose, strong reasoning
`deepseek-coder-v2:16b-lite-instruct-q4_K_M`	10 GB	Code generation
`mistral-nemo`	7.1 GB	Fast general-purpose

Total: ~27 GB. Stored on a 1.8 TB volume (currently 10% used).

Note: The RTX 5060 has 8 GB VRAM. Only one model loads at a time (OLLAMA_MAX_LOADED_MODELS=1). Models larger than VRAM will run partially on CPU.

Logs

All service logs are written to /var/log/ inside the container:

File	Service
`/var/log/ollama.log` / `.err`	Ollama stdout / stderr
`/var/log/open-webui.log` / `.err`	Open WebUI stdout / stderr
`/var/log/cloudflared.log` / `.err`	Cloudflared stdout / stderr
`/var/log/supervisor/supervisord.log`	Supervisor daemon log

Quick check:

sudo supervisorctl status
sudo supervisorctl tail -f open-webui stderr

Rebuild Checklist

Everything needed to fully recreate the environment is either in the image or on the host:

What	Where	Survives rebuild?
Container config	`.devcontainer/` in repo	Yes — source of truth
Model weights	`~/.ollama` (host mount)	Yes
Cloudflare credentials	`~/.cloudflared` (host mount)	Yes
Open WebUI data (users, chats)	`~/.local/share/open-webui` (inside container)	No — lost on rebuild
mise tool cache	Inside image (baked at build time)	Rebuilt from `mise.toml`

Warning: Open WebUI user accounts and chat history live inside the container at /home/ubuntu/.local/share/open-webui. To persist this across rebuilds, add a host mount for that path in devcontainer.json.

my DevOps Odyssey

“Σα βγεις στον πηγαιμό για την Ιθάκη, να εύχεσαι να ‘ναι μακρύς ο δρόμος, γεμάτος περιπέτειες, γεμάτος γνώσεις.” - Kavafis’ Ithaka.