Kubernetes is powerful but verbose. Deploying a single ML service typically requires writing a Deployment, a Service, an Ingress, a cert-manager Certificate or annotation, and optionally a KServe InferenceService — five resources, hundreds of lines of YAML, and a maintainability burden that compounds with every new component you add.
The WarbleApp CRD collapses all of that into a single resource. This post walks through how the operator works, what it creates, and how to write your first manifest.
What a WarbleApp creates
| Resource | Created when | Key fields set by operator |
|---|---|---|
Deployment | Always | Image, replicas, containerPort, nodeSelector warble.io/pool=workload, env vars |
Service | Always | ClusterIP, port + targetPort from spec.port (default 8080) |
Ingress | ingress.enabled: true | nginx class, cert-manager letsencrypt-prod, TLS secret <name>-tls |
InferenceService | mlServing.enabled: true | Model URI, runtime, GPU request, namespace warble-models |
All four are owned by the WarbleApp. Delete the CR and all four are garbage-collected. Update the image tag and the Deployment rolls out automatically.
A minimal manifest
apiVersion: warble.io/v1alpha1
kind: WarbleApp
metadata:
name: sentiment-api
namespace: warble-system
spec:
stack: api
image: warbleoss.azurecr.io/sentiment-api:v1.2.0
replicas: 2
port: 8000 # FastAPI — not the default 8080
resources:
requests: {cpu: 250m, memory: 256Mi}
limits: {cpu: "1", memory: 512Mi}
ingress:
enabled: true
host: sentiment.frakma.io
tlsEnabled: true
env:
- name: MODEL_NAME
value: sentiment-v2
- name: DATABASE_URL
valueFrom:
secretKeyRef: {name: app-secrets, key: DATABASE_URL}
Apply it: kubectl apply -f sentiment-api.yaml. Within seconds, the operator has created all four owned resources and the status reflects the Deployment's rollout progress.
Checking status
kubectl get wapp -n warble-system
NAME STACK REPLICAS PHASE AGE
sentiment-api api 2 Running 2m
kubectl describe wapp sentiment-api -n warble-system
# Events show each reconcile step and any errors
Adding KServe model serving
For components that both serve an API and expose a model endpoint, add the mlServing block. The operator creates a KServe InferenceService alongside the Deployment:
spec:
# ... existing fields ...
mlServing:
enabled: true
modelUri: "azureblob://warbleosstate/mlflow-artifacts/sentiment/v2"
runtime: mlserver # triton | mlserver | torchserve | ollama
gpuEnabled: false
The InferenceService is created in the warble-models namespace with the model URI from MLflow's artifact store. KServe handles autoscaling, canary traffic splitting, and health-checking the model runtime independently of the application Deployment.
The port field: why it matters
The reconciler's first version hardcoded port 8080 everywhere — Service port, targetPort, and Ingress backend. This was fine for Go services but broke FastAPI (8000) and Next.js (3000). Rather than adding a PORT=8080 env var hack to every non-standard service, we added a port: field to the spec:
# In types.go
// +kubebuilder:default=8080
Port int32 `json:"port,omitempty"`
// In the reconciler — one helper, three callsites
func appPort(app *warblev1alpha1.WarbleApp) int32 {
if app.Spec.Port > 0 { return app.Spec.Port }
return 8080
}
The +kubebuilder:default=8080 marker sets the OpenAPI default in the CRD schema, so existing manifests that don't specify port: continue to work without change — fully backwards-compatible.
Reconciler architecture
The controller follows the standard kubebuilder pattern: fetch the CR, reconcile each owned resource via create-or-update, then update the status subresource with the current Deployment replica count.
- Idempotent: calling Reconcile twice has the same effect as calling it once. Safe to re-queue on any error.
- Ownership:
ctrl.SetControllerReferenceis called on every owned resource. Garbage collection is automatic. - Status: phases are derived purely from
Deployment.Status.AvailableReplicasvsSpec.Replicas— no custom state machine.
Extending the operator
The most common extensions teams add after using WarbleApp for a few weeks:
- HPA: add a
scalingblock that creates aHorizontalPodAutoscaler— min/max replicas, CPU or custom metric target. - ConfigMap: add a
configmap field that the reconciler projects as a volume mount. - PodMonitor: add a
metricsblock that creates a PrometheusPodMonitorfor the service.
Each is a small addition to types.go and a new reconcile* function in the controller. The kubebuilder scaffolding handles the rest.