Deep Dive

MLflow on Kubernetes:
Shared PostgreSQL, Azure Blob Artifacts, and Wildcard TLS

Warble Cloud·5 min read·MLflow · cert-manager · NGINX · Azure Blob
MLflow 3.7PostgreSQLAzure Blobcert-managerNGINX Ingress

The default MLflow Helm chart ships with a bundled PostgreSQL sidecar and no artifact backend — fine for local development, unusable in production. This post documents the three changes we made to run MLflow reliably on FRAKMA: shared cluster PostgreSQL, Azure Blob artifact storage, and a wildcard TLS certificate via cert-manager.

Why not the bundled PostgreSQL?

The Helm chart's postgresql.enabled: true creates a single-replica Postgres pod in the mlflow namespace with no backup, no HA, and no shared access. Every time we upgraded the MLflow chart, we risked the bundled Postgres getting recreated and losing metadata. Moving to the shared warble-system PostgreSQL instance — backed by Azure Managed Disks with daily snapshots — solved all of that in one move.

The complete values.yaml

image:
  repository: burakince/mlflow
  tag: "3.7.0"     # community image with Azure Blob support built-in

backendStore:
  databaseMigration: true
  databaseConnectionCheck: true
  postgres:
    enabled: true
    host: postgresql.warble-system.svc.cluster.local
    port: 5432
    database: mlflow
    user: warble
    password: warble          # override with secretKeyRef in prod

artifactRoot:
  proxiedArtifactStorage: true    # MLflow proxies artifact downloads
  azureBlob:
    enabled: true
    storageAccount: warbleosstate
    container: mlflow-artifacts

# Azure storage key injected from secret (env var name = AZURE_STORAGE_ACCESS_KEY)
extraSecretNamesForEnvFrom:
  - mlflow-azure-secret

postgresql:
  enabled: false    # disable bundled Postgres

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: mlflow-basic-auth
  hosts:
    - host: mlflow.frakma.io
      paths: [{path: /, pathType: Prefix}]
  tls:
    - secretName: frakma-io-wildcard-tls
      hosts: [mlflow.frakma.io]

The wildcard TLS certificate

Rather than provisioning a separate cert for every subdomain, we issue one wildcard cert (*.frakma.io) via cert-manager's DNS-01 challenge against Cloudflare. Every platform component references the same frakma-io-wildcard-tls secret.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: frakma-io-wildcard
  namespace: warble-system
spec:
  secretName: frakma-io-wildcard-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - "*.frakma.io"
    - frakma.io

Once issued, the secret is copied to other namespaces using a simple reflector or by referencing it cross-namespace via Helm values. No per-service cert provisioning, no cert renewal per subdomain.

Basic auth for the tracker UI

# Create htpasswd secret
htpasswd -nb mlflow-admin your-password | kubectl create secret generic mlflow-basic-auth \
  --from-file=auth=/dev/stdin -n mlflow
Note on proxied artifact storage

Setting proxiedArtifactStorage: true means the MLflow server proxies artifact downloads through itself rather than giving clients a direct Azure SAS URL. This keeps the storage access key server-side and avoids exposing Azure credentials to every MLflow client.

Verifying the setup

# Check MLflow can connect to Postgres
kubectl logs -n mlflow -l app=mlflow | grep "database migration"

# Test artifact upload from a training job
import mlflow
mlflow.set_tracking_uri("https://mlflow.frakma.io")
with mlflow.start_run():
    mlflow.log_param("test", "value")
    mlflow.log_artifact("model.pkl")   # → warbleosstate/mlflow-artifacts/

Continue reading

Next: Airflow 3 on AKS →