WarbleApp CRD: One Manifest to Rule Them All

Kubernetes is powerful but verbose. Deploying a single ML service typically requires writing a Deployment, a Service, an Ingress, a cert-manager Certificate or annotation, and optionally a KServe InferenceService — five resources, hundreds of lines of YAML, and a maintainability burden that compounds with every new component you add.

The WarbleApp CRD collapses all of that into a single resource. This post walks through how the operator works, what it creates, and how to write your first manifest.

What a WarbleApp creates

Resource	Created when	Key fields set by operator
`Deployment`	Always	Image, replicas, containerPort, nodeSelector `warble.io/pool=workload`, env vars
`Service`	Always	ClusterIP, port + targetPort from `spec.port` (default 8080)
`Ingress`	`ingress.enabled: true`	nginx class, cert-manager letsencrypt-prod, TLS secret `<name>-tls`
`InferenceService`	`mlServing.enabled: true`	Model URI, runtime, GPU request, namespace `warble-models`

All four are owned by the WarbleApp. Delete the CR and all four are garbage-collected. Update the image tag and the Deployment rolls out automatically.

A minimal manifest

apiVersion: warble.io/v1alpha1
kind: WarbleApp
metadata:
  name: sentiment-api
  namespace: warble-system
spec:
  stack: api
  image: warbleoss.azurecr.io/sentiment-api:v1.2.0
  replicas: 2
  port: 8000          # FastAPI — not the default 8080
  resources:
    requests: {cpu: 250m, memory: 256Mi}
    limits:   {cpu: "1", memory: 512Mi}
  ingress:
    enabled: true
    host: sentiment.frakma.io
    tlsEnabled: true
  env:
    - name: MODEL_NAME
      value: sentiment-v2
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef: {name: app-secrets, key: DATABASE_URL}

Apply it: kubectl apply -f sentiment-api.yaml. Within seconds, the operator has created all four owned resources and the status reflects the Deployment's rollout progress.

Checking status

kubectl get wapp -n warble-system

NAME              STACK   REPLICAS   PHASE     AGE
sentiment-api     api     2          Running   2m

kubectl describe wapp sentiment-api -n warble-system
# Events show each reconcile step and any errors

Adding KServe model serving

For components that both serve an API and expose a model endpoint, add the mlServing block. The operator creates a KServe InferenceService alongside the Deployment:

spec:
  # ... existing fields ...
  mlServing:
    enabled: true
    modelUri: "azureblob://warbleosstate/mlflow-artifacts/sentiment/v2"
    runtime: mlserver      # triton | mlserver | torchserve | ollama
    gpuEnabled: false

The InferenceService is created in the warble-models namespace with the model URI from MLflow's artifact store. KServe handles autoscaling, canary traffic splitting, and health-checking the model runtime independently of the application Deployment.

The port field: why it matters

The reconciler's first version hardcoded port 8080 everywhere — Service port, targetPort, and Ingress backend. This was fine for Go services but broke FastAPI (8000) and Next.js (3000). Rather than adding a PORT=8080 env var hack to every non-standard service, we added a port: field to the spec:

# In types.go
// +kubebuilder:default=8080
Port int32 `json:"port,omitempty"`

// In the reconciler — one helper, three callsites
func appPort(app *warblev1alpha1.WarbleApp) int32 {
    if app.Spec.Port > 0 { return app.Spec.Port }
    return 8080
}

Kubebuilder default

The +kubebuilder:default=8080 marker sets the OpenAPI default in the CRD schema, so existing manifests that don't specify port: continue to work without change — fully backwards-compatible.

Reconciler architecture

The controller follows the standard kubebuilder pattern: fetch the CR, reconcile each owned resource via create-or-update, then update the status subresource with the current Deployment replica count.

Idempotent: calling Reconcile twice has the same effect as calling it once. Safe to re-queue on any error.
Ownership: ctrl.SetControllerReference is called on every owned resource. Garbage collection is automatic.
Status: phases are derived purely from Deployment.Status.AvailableReplicas vs Spec.Replicas — no custom state machine.

Extending the operator

The most common extensions teams add after using WarbleApp for a few weeks:

HPA: add a scaling block that creates a HorizontalPodAutoscaler — min/max replicas, CPU or custom metric target.
ConfigMap: add a config map field that the reconciler projects as a volume mount.
PodMonitor: add a metrics block that creates a Prometheus PodMonitor for the service.

Each is a small addition to types.go and a new reconcile* function in the controller. The kubebuilder scaffolding handles the rest.

WarbleApp CRD:One Manifest for Deployment, Service, Ingress, and InferenceService

What a WarbleApp creates

A minimal manifest

Checking status

Adding KServe model serving

The port field: why it matters

Reconciler architecture

Extending the operator

WarbleApp CRD:
One Manifest for Deployment, Service, Ingress, and InferenceService