Case Study

Running Reflexion AI on FRAKMA:
Autonomous Incident Resolution with Swappable LLM Agents

Warble Cloud·10 min read·WarbleApp CRD · Argo CD · OpenFaaS · KServe
WarbleApp CRDArgo CDActor/Critic GroqVertex AIOllamaKServe

Reflexion AI is Warble Cloud's autonomous incident resolution platform. It uses an Actor/Critic LLM loop to detect, diagnose, and remediate production incidents without waking anyone up. This post documents how we deploy it on FRAKMA using the WarbleApp CRD — and how the three-variant agent design lets us swap LLM providers with a single kubectl apply, with no changes to application code.

The Actor/Critic architecture

Reflexion Engine runs a two-phase reasoning loop for every incident:

The key insight: only the Actor needs to change when you want to try a different model. The Critic is a fixed-point validator — it should never be cost-optimised.

Three agent variants, one image

All three engine variants use the same container image. The LLM provider is selected at runtime via the LLM_PROVIDER environment variable. This means one build, three deployment manifests.

Production default

engine-vertex

Gemini 2.5 Flash as both Actor and Critic. Best reasoning quality. Requires GCP Workload Identity or service account key.

Cost-optimised

engine-groq

Llama 3.1-8b via Groq Cloud as Actor. Vertex Critic stays. ~10× cheaper per incident, <200ms actor latency.

Air-gapped

engine-ollama

Local Ollama on GPU spot nodes as Actor. No external API calls for inference. Vertex Critic still fires — one outbound call per validation.

The WarbleApp CRD manifests

Each variant is a WarbleApp custom resource. The reconciler creates a Deployment, ClusterIP Service, and nginx Ingress automatically. You never write those resources by hand.

# k8s/reflexion/agents/engine-groq.yaml
apiVersion: warble.io/v1alpha1
kind: WarbleApp
metadata:
  name: reflexion-engine-groq
  namespace: warble-system
  labels:
    warble.io/agent: reflexion-engine
    warble.io/llm-provider: groq
spec:
  stack: reflexion
  image: warbleoss.azurecr.io/reflexion-engine:latest
  replicas: 2
  resources:
    requests: {cpu: 250m, memory: 256Mi}
    limits:   {cpu: 500m, memory: 512Mi}
  ingress:
    enabled: true
    host: reflexion.frakma.io
    tlsEnabled: true
  env:
    - name: LLM_PROVIDER
      value: groq
    - name: GROQ_API_KEY
      valueFrom:
        secretKeyRef: {name: reflexion-secrets, key: GROQ_API_KEY}
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef: {name: reflexion-secrets, key: DATABASE_URL}

The Vertex and Ollama variants are structurally identical — only the LLM_PROVIDER value and the corresponding API key secret differ. Resource requests are smaller for the Groq variant because the Actor call returns in milliseconds rather than seconds.

Switching variants in production

All three variants share the same reflexion.frakma.io ingress host, so only one should be active at a time. Switching is two commands:

# Remove current engine
kubectl delete wapp reflexion-engine-vertex -n warble-system

# Apply the new variant
kubectl apply -f k8s/reflexion/agents/engine-groq.yaml

# Watch the rollout
kubectl get wapp -n warble-system --watch

Or trigger the GitHub Actions deploy workflow with agent=engine-groq — it handles the image tag patch, kubectl diff, and the production approval gate for you.

Supporting components

ComponentManifestWhy it exists
reflexion-ai-serveragents/ai-server.yamlFastAPI service: hypothesis engine, healing API, metrics. port: 8000
reflexion-executoragents/executor.yamlValidates and executes remediation actions with SLO + blast radius guardrails
reflexion-frontendagents/frontend.yamlNext.js dashboard. port: 3000, ingress: app.reflexion.frakma.io
qdrantqdrant.yamlVector DB for semantic recipe search. Raw k8s — not a WarbleApp.
Why the port field matters

The WarbleApp reconciler hardcoded port 8080 in its first version. ai-server runs on 8000 and frontend on 3000. Rather than forcing a PORT=8080 env var hack on every non-standard service, we added a port: field to the CRD spec so the reconciler wires the Service and Ingress backend to the right port natively.

Observability

Each component is scraped by Prometheus. Grafana dashboards track:

The cost dashboard is how we validated the Groq variant: same resolution rate, $0.003 per incident vs $0.031 on Vertex-only — a 10× cost reduction with no measurable quality difference on the incidents we tested.

Lessons learned

Continue reading

Next: Multi-Provider LLM Routing with OpenFaaS →