1 Context
Knowmler Lab turns a promoted idea into a working app by calling Context
Foundry. Today the whole lifecycle (run the AI build, build the image, host the
preview) runs on the main VPS through Foundry's
local_docker backend. That has three problems:
Blast radius
AI-generated apps run as containers on the same host as Knowmler production and the trading bots. Untrusted code next to real money.
Contention & fragility
Builds spike CPU on the
production box (the exit-125 incident), and the preview path is brittle
(the foundry-caddy auto-HTTPS 502 and the edge-network gap).
No elasticity
Each preview is a long-lived container. Previews do not scale to zero and accumulate on the box.
The contract that stays fixed
Knowmler submits
builds via POST /v1/jobs and polls. That interface does not
change; only the execution target behind it does.
2 Decision
D1 — Backends live inside Foundry, never inside Knowmler
Knowmler's interface stays as it is: POST /v1/jobs + poll. The
execution target (local Docker, Azure, a future GitHub or Cloudflare backend)
is a BuildBackend selected by Foundry, configured by
environment, swappable without touching Knowmler. This preserves the queue,
idempotency, per-job logs, artifacts, cancel/TTL, cleanup, and the scoped-token
auth model that Foundry's /v1 service already provides.
D2 — Adopt the in-tree azure_container_apps backend
Run builds as Azure Container Apps Jobs and previews as scale-to-zero Azure Container Apps, with images in Azure Container Registry and job objects in Azure Blob Storage. Chosen because it is the only complete, already-written off-VPS backend: it covers build and preview, scales to zero, isolates untrusted apps, and aligns with Foundry's existing abstraction. The cost is onboarding Azure as a vendor.
3 Alternatives considered
| Option | Verdict | Why |
|---|---|---|
Stay on main VPS (local_docker) | Rejected | Untrusted apps beside trading money; contention; preview fragility. |
| Dedicated preview VPS | Fallback | Cheap isolation, zero new code, but self-managed and not elastic. |
| GitHub build + Cloudflare Containers | Deferred | Cleanest single-vendor stack, but no backend exists: new BuildBackend + routing Worker. Most engineering, no scale problem yet. |
Knowmler -> GitHub workflow_dispatch | Rejected | Bypasses the entire Foundry control plane. |
4 C4 architecture
Level 1 — System Context
C4Context
title System Context - Knowmler Lab builds via Context Foundry on Azure
Person(admin, "Lab Admin", "Promotes an idea and clicks Build")
Person_Ext(viewer, "Preview Viewer", "Opens the built app")
System(knowmler, "Knowmler", "Lab platform. Submits builds and polls status.")
System(foundry, "Context Foundry Build Service", "/v1 control plane: queue, idempotency, logs, artifacts, TTL, auth proxy")
System_Ext(azure, "Microsoft Azure", "Build compute and preview hosting")
System_Ext(anthropic, "Anthropic Claude", "LLM, via Foundry scoped auth proxy")
Rel(admin, knowmler, "Clicks Build")
Rel(knowmler, foundry, "POST /v1/jobs; polls", "HTTPS + bearer")
Rel(foundry, azure, "Runs build job, builds image, deploys preview", "ARM API")
Rel(foundry, anthropic, "Per-job scoped proxy token", "HTTPS")
Rel(viewer, azure, "Opens app.org.knowmler.com", "HTTPS")
UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="2")
Level 2 — Container Diagram
C4Container
title Container Diagram - azure_container_apps backend
Person(admin, "Lab Admin")
System_Boundary(kn, "Knowmler (VPS)") {
Container(fe, "Frontend", "Next.js", "Lab UI and Build button")
Container(be, "Backend + Worker", "FastAPI", "Calls /v1/jobs, polls")
}
System_Boundary(az, "Azure Resource Group: rg-foundry-prod") {
Container(daemon, "foundry serve", "Container App (Rust)", "The /v1 control plane")
Container(job, "Build Job", "ACA Job (foundry-builder)", "QRPBA, 2 vCPU / 4 GiB, scoped token")
Container(preview, "Preview App", "Container App, scale-to-zero", "The built application")
Container(acr, "Container Registry", "ACR", "foundry-builder + per-app images")
ContainerDb(blob, "Job Storage", "Blob foundry-jobs", "inputs, logs, artifacts")
ContainerDb(logs, "Log Analytics", "Workspace", "ACA environment logs")
}
System_Ext(anthropic, "Anthropic Claude")
Rel(admin, fe, "Build")
Rel(fe, be, "POST /api/lab/ideas/{id}/build")
Rel(be, daemon, "POST /v1/jobs; poll", "HTTPS")
Rel(daemon, job, "Create + start ACA Job", "ARM")
Rel(job, blob, "I/O via SAS")
Rel(job, anthropic, "scoped proxy token")
Rel(daemon, acr, "Build app image (ACR Tasks)")
Rel(daemon, preview, "Deploy from image", "ARM")
Rel(preview, acr, "Pull image")
Rel(daemon, logs, "logs")
UpdateLayoutConfig($c4ShapeInRow="2", $c4BoundaryInRow="1")
5 Azure deployment design
Deployment topology
flowchart TB
classDef az fill:#e6f2fb,stroke:#0078D4,stroke-width:1.5px,color:#243a5e;
classDef edge fill:#f3f9fd,stroke:#5b6b7f,stroke-width:1px,color:#243a5e;
VIEWER["Preview Viewer"]:::edge
CF["Cloudflare DNS + TLS
*.org.knowmler.com"]:::edge
BE["Knowmler Backend + Worker
POST /v1/jobs, poll"]:::edge
subgraph RG["Azure Resource Group: rg-foundry-prod"]
direction TB
SVC["Container App: ca-foundry-service
foundry serve (/v1)"]:::az
ENV["Container Apps Env: cae-foundry"]:::az
JOB["ACA Job: caj-foundry-build
2 vCPU / 4 GiB"]:::az
PREV["Container Apps: ca-preview-*
scale-to-zero"]:::az
ACR["Container Registry: crfoundry"]:::az
ST["Storage + Blob
stfoundryjobs / foundry-jobs"]:::az
LOG["Log Analytics: log-foundry"]:::az
ID["Managed Identity: id-foundry"]:::az
end
BE -->|/v1 HTTPS + bearer| SVC
SVC -->|create/start ARM| JOB
SVC -->|ACR Tasks build| ACR
SVC -->|deploy ARM| PREV
JOB -->|SAS I/O| ST
JOB -->|pull builder| ACR
PREV -->|pull app image| ACR
SVC --- ENV
JOB --- ENV
PREV --- ENV
ENV --> LOG
SVC -. uses .-> ID
VIEWER --> CF -->|CNAME| PREV
Resource inventory
foundry serve, the /v1 control plane.Identity and least-privilege roles
| Principal | Role | Scope | Why |
|---|---|---|---|
| ca-foundry-service MI | Contributor (or custom: Microsoft.App/* + ACR scheduleRun) | RG | Create/start jobs, build images, deploy/delete previews. |
| ca-foundry-service MI | Storage Blob Data Contributor | Storage acct | Read/write job objects. |
| ca-foundry-service MI | AcrPush | ACR | Push built app images. |
| Job + preview MI | AcrPull | ACR | Pull builder + app images. |
6 Build & preview lifecycle
sequenceDiagram
autonumber
participant K as Knowmler worker
participant F as foundry serve
participant J as ACA Job (builder)
participant R as ACR
participant P as Preview App
K->>F: POST /v1/jobs (SPEC.md, TASKS.md, org_slug)
F->>F: enqueue, idempotency, SAS grant + scoped proxy token
F->>J: create + start ACA Job
J->>J: QRPBA build (LLM via proxy token)
J-->>F: stream.jsonl via append blob
F->>R: ACR Tasks build app image, push
F->>P: deploy scale-to-zero Container App
P-->>F: FQDN
F-->>K: status ready + preview_url
Note over F,P: TTL reaper + teardown DELETE Job, Container App, ACR repo
7 Service configuration
Read only when built --features azure and
FOUNDRY_SERVICE_BUILD_BACKEND=azure_container_apps. The first seven
are required.
| Variable | Required | Example |
|---|---|---|
FOUNDRY_SERVICE_AZURE_SUBSCRIPTION_ID | yes | <sub-guid> |
FOUNDRY_SERVICE_AZURE_RESOURCE_GROUP | yes | rg-foundry-prod |
FOUNDRY_SERVICE_AZURE_LOCATION | yes | eastus2 |
FOUNDRY_SERVICE_AZURE_STORAGE_ACCOUNT | yes | stfoundryjobs |
FOUNDRY_SERVICE_AZURE_STORAGE_KEY | yes | <key, prefer Key Vault> |
FOUNDRY_SERVICE_AZURE_ACR_NAME | yes | crfoundry |
FOUNDRY_SERVICE_AZURE_ACA_ENVIRONMENT | yes | cae-foundry |
FOUNDRY_SERVICE_AZURE_STORAGE_CONTAINER | no (foundry-jobs) | foundry-jobs |
FOUNDRY_SERVICE_AZURE_MI_CLIENT_ID | no (system-assigned) | <mi-guid> |
FOUNDRY_SERVICE_AZURE_SIGNED_URL_TTL_SECS | no (900) | 900 |
FOUNDRY_SERVICE_AZURE_SAS_GRANT_TTL_SECS | no (3600) | 3600 |
<app>.<org-slug>.knowmler.com. Cloudflare holds DNS + edge
TLS; a CNAME points the per-org wildcard at the ACA environment, with the
preview bound as an ACA custom domain. The org-slug is the owning
org's slug (fixed in commit 185cee5), capped at 63 chars.8 Cost model
List price, region-dependent.
- ACA Consumption has a monthly free grant per subscription (about 180,000 vCPU-seconds + 360,000 GiB-seconds + 2M requests). A 53-minute build at 2 vCPU / 4 GiB is roughly 6,360 vCPU-s + 12,720 GiB-s, so a handful of builds per month likely falls inside the free grant.
- Preview Container Apps scale to zero: idle is approximately nothing; active time is billed per vCPU-s / GiB-s.
- ACR Basic is about $5/month (Standard about $20).
- Storage + Log Analytics: a few dollars at this volume.
- Rough total at low volume: about $5-15/month, dominated by the fixed ACR SKU, with compute largely inside the free grant.
9 Security posture
Scoped tokens, not the OAuth key
Builds get a per-job scoped proxy token, never the raw Claude OAuth credential. The auth proxy revokes it when the job ends or is canceled.
Short-TTL grants
Job I/O uses short-TTL SAS; artifact downloads use short-TTL signed URLs.
Isolation
Untrusted generated apps run in their own Container Apps, away from Knowmler production and the trading host.
Secret storage
Prefer Key Vault for the storage key
and FOUNDRY_SERVICE_API_KEYS rather than plaintext env.
10 Implementation steps
- Provision
rg-foundry-prod: Log Analytics, ACA environment, Storage +foundry-jobscontainer, ACR, managed identity, role assignments. - Build Foundry
--features azure; build and pushfoundry-buildertocrfoundry(in a Rust container or CI; the host has no Rust toolchain). - Deploy
ca-foundry-servicewith theFOUNDRY_SERVICE_AZURE_*env andFOUNDRY_SERVICE_BUILD_BACKEND=azure_container_apps. - Point Knowmler's
FOUNDRY_API_URLat the Azure daemon; rotateFOUNDRY_SERVICE_API_KEYSper the runbook. - Wire the per-org preview domain (Cloudflare CNAME + ACA custom domain).
org_slugrouting is already fixed. - Canary: run one build end-to-end; confirm preview and teardown.
- Cutover: switch Knowmler builds to the Azure daemon; keep the VPS path as a documented rollback.
11 Risks & open questions
- Azure onboarding effort (subscription, IAM, ACR, DNS) is the main cost. The dedicated-preview-VPS path is the documented fallback.
- Custom-domain wildcard wiring between Cloudflare and ACA needs a spike (managed cert vs Cloudflare-origin cert).
- Anthropic Agent SDK credit metering begins 2026-06-15 for programmatic subscription usage; confirm the impact on the auth-proxy token model.
--features azurebuild + image size forfoundry-builderneeds a first real run to validate.