Introduction

I recently set up a three-node Vault Enterprise HA cluster on OpenShift, using HCP Vault as the auto-unseal provider via the transit secrets engine. On paper this is a straightforward combination of well-documented features. In practice, it was a series of traps — some subtle, some spectacular — that took multiple sessions to fully work through.

This post covers the four main challenge areas I hit: getting IPC_LOCK right on OpenShift, wiring up the auto-unseal token flow securely, managing Raft quorum safely during rolling updates, and working around a reconciliation bug in Vault Secrets Operator. I’ll focus on what caught me off guard and what the correct solution looks like.

The deployment is GitOps-managed via ArgoCD using a three-source Helm pattern: the upstream Vault chart, a values file from the repo, and raw manifests for cluster-level resources (SCCs, Routes, ConfigMaps) that Helm can’t cleanly own.


Challenge 1: IPC_LOCK on OpenShift

The Problem

Vault requires the IPC_LOCK Linux capability so it can call mlockall() to prevent secrets from being swapped to disk. This is non-negotiable — if the capability is missing, Vault exits at startup. On OpenShift this runs into the default restricted-v2 Security Context Constraint (SCC), which doesn’t include IPC_LOCK in its allowed capabilities.

I discovered this the hard way when all three pods went into CrashLoopBackOff immediately after deploying. What made it more confusing was that the error message differed depending on the image:

  • Upstream image (docker.io/hashicorp/vault-enterprise): Vault starts, calls mlockall(), gets EPERM, and exits with:
    Vault requires the IPC_LOCK capability
    
  • Red Hat partner image (registry.connect.redhat.com/hashicorp/vault-enterprise): The binary ships with cap_ipc_lock+ep set via setcap. When the kernel drops that from the bounding set at exec time (because the SCC doesn’t allow it), the exec() itself fails before Vault even starts:
    exec: /usr/bin/vault: operation not permitted
    

Both root causes are the same — no IPC_LOCK in the SCC — but the symptoms look completely different, which costs time when debugging.

The Solution: Custom SCC

The fix is a custom SCC that adds IPC_LOCK as both an allowed and default capability:

apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: vault-ipc-lock
  annotations:
    kubernetes.io/description: >
      Extends restricted-v2 to allow IPC_LOCK for Vault mlock support.      
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities:
  - IPC_LOCK
defaultAddCapabilities:
  - IPC_LOCK
fsGroup:
  type: MustRunAs
readOnlyRootFilesystem: false
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
seccompProfiles:
  - runtime/default
supplementalGroups:
  type: RunAsAny
volumes:
  - configMap
  - downwardAPI
  - emptyDir
  - ephemeral
  - persistentVolumeClaim
  - projected
  - secret

Custom SCCs don’t automatically get a ClusterRole the way built-in ones do, so you also need to create one manually and bind it to the Vault service account:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:openshift:scc:vault-ipc-lock
rules:
  - apiGroups: ["security.openshift.io"]
    resources: ["securitycontextconstraints"]
    resourceNames: ["vault-ipc-lock"]
    verbs: ["use"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: vault-ipc-lock
  namespace: vault
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:scc:vault-ipc-lock
subjects:
  - kind: ServiceAccount
    name: vault
    namespace: vault

The SCC Priority Gotcha

Here’s the non-obvious part. When multiple SCCs have the same priority, OpenShift’s SCC resolver picks one based on a deterministic but non-obvious ordering algorithm. In my cluster, a pre-existing odf-blackbox-scc had the same default priority of 10. Without an explicit hint in the pod spec, the init container landed on odf-blackbox-scc and failed — despite the vault service account being bound to my new SCC.

The fix is to explicitly declare IPC_LOCK in the container’s securityContext:

# In the Vault Helm values.yaml
server:
  extraContainers: []
  statefulSet:
    securityContext:
      container:
        capabilities:
          add:
            - IPC_LOCK

This hint steers the SCC resolver to select vault-ipc-lock over any same-priority SCC that doesn’t advertise IPC_LOCK support. The hint is required even when defaultAddCapabilities is set — without it, the resolver may never reach your SCC.

Image Choice

I also switched from the Red Hat partner image to the upstream docker.io/hashicorp/vault-enterprise:2.0.1-ent. Without the docker.io/ prefix, RHCOS short-name rewriting rewrites unqualified image tags to registry.connect.redhat.com/... at pull time. That’s fine in theory, but in practice the partner mirror had divergent cached bytes across nodes (different sha256 digests for the same tag) and was using a floating 2.0-ent tag rather than a pinned version. Once the SCC was in place, the upstream image with an explicit registry and pinned tag worked cleanly.


Challenge 2: Auto-Unseal Token Flow

Why HCP Vault Dedicated?

Before getting into the mechanics, it’s worth explaining the choice of HCP Vault as the transit unseal provider rather than a self-hosted instance.

The transit auto-unseal mechanism doesn’t require HCP Vault — any Vault cluster (including Community Edition) can host a transit secrets engine and serve as the unseal provider. The problem is the bootstrap paradox: if your transit unseal cluster is also self-hosted, you need a way to unseal it before it can unseal anything else. You’ve just moved the problem one level up.

HCP Vault Dedicated sidesteps this entirely. HashiCorp operates the cluster, handles HA failover, applies upgrades, and manages the underlying infrastructure. Critically, it uses cloud KMS for its own unsealing, so there’s no chicken-and-egg problem — it’s always available when your OpenShift pods start up.

For a use case like this — a single transit key used only at pod startup — the Development tier is sufficient. It’s a single-node cluster (no SLA, fine for a non-critical dependency like this one), capped at 25 clients, and costs $0.03/hour — roughly $21.60/month. That’s a low price for eliminating the unseal bootstrapping problem entirely and offloading upgrades and operations to HashiCorp.

The Architecture

Vault’s transit auto-unseal requires a token with encrypt/decrypt access to an HCP Vault transit key. On every pod startup, Vault needs that token to be available before it can unseal. The token can’t live in a static Kubernetes Secret (that would be a long-lived credential), so it needs to be minted fresh at startup using the pod’s service account JWT.

The flow is:

  1. OpenShift projects each pod’s service account token to /var/run/secrets/kubernetes.io/serviceaccount/token
  2. An init container uses that JWT to authenticate to HCP Vault’s JWT auth method (POST /v1/auth/jwt/login)
  3. On successful auth, it mints a short-lived child token (24h TTL, no_parent so it’s an orphan, autounseal policy)
  4. It renders the transit seal stanza with that token to a shared emptyDir
  5. The main Vault container reads that stanza via vault server -config=...

HCP Vault’s JWT auth method is configured with:

  • bound_audiences: "https://kubernetes.default.svc" — matches the audience in the SA token
  • bound_subject: "system:serviceaccount:vault:vault" — locks it to exactly this service account
  • jwks_url pointing to OpenShift’s OIDC JWKS endpoint
  • The cluster CA cert for TLS verification

This gives each pod a fresh, per-pod-bound token that’s useless to any other workload. It’s about as tight as you can make a startup credential.

The Evolution: curl to Vault Agent

My first implementation used a curl+bash init container — about 50 lines of shell inline in values.yaml. It worked but had several problems:

  • JSON parsing with jq and grep with fragile exit code handling
  • The rendered seal.hcl had the token in plaintext on disk (acceptable but not great)
  • Silent failure modes: if the token-create call failed, the shell would continue and render an empty seal.hcl
  • Hard to maintain and test

I replaced it with a declarative Vault Agent running in one-shot mode (exit_after_auth = true). The agent config lives in a ConfigMap:

# vault-agent-config.yaml
vault {
  address = "https://vault.hashicorp.cloud:8200"

  tls {
    ca_cert = "/vault/userconfig/hcp-ca/ca.crt"
  }
}

auto_auth {
  method "jwt" {
    namespace = "admin/my-org"
    mount_path = "auth/jwt"
    config {
      path       = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      role       = "autounseal-token"
    }
  }
  sink "file" {
    config {
      path = "/tmp/vault-agent-token"
    }
  }
}

template {
  contents = <<EOT
{{- with secret "auth/token/create" "policies=autounseal" "period=24h" "no_parent=true" "orphan=true" -}}
seal "transit" {
  address     = "https://vault.hashicorp.cloud:8200"
  namespace   = "admin/my-org"
  token       = "{{ .Auth.ClientToken }}"
  key_name    = "autounseal"
  mount_path  = "transit"
  tls_ca_cert = "/vault/userconfig/hcp-ca/ca.crt"
}
{{- end -}}
EOT
  destination = "/vault/userconfig/vault-config/seal.hcl"
}

The agent logs clearly report whether JWT auth succeeded and whether the template rendered — a significant improvement over silent shell failures. The exit_after_auth = true flag makes it act as a one-shot runner rather than a persistent sidecar.

The command: Override Gotcha

One subtle issue: if you override command: in the init container spec, you bypass the image’s docker-entrypoint.sh. That entrypoint runs a setcap -r /usr/bin/vault cap-clearing step before execing the binary — important because the partner image ships with cap_ipc_lock+ep baked into the binary’s extended attributes.

The fix is to only set args: (e.g., ["agent", "-config=/path/to/agent.hcl"]) and leave command: unset. The entrypoint handles the rest.


Challenge 3: Raft Quorum During Rolling Updates

The OnDelete Trap

The Vault Helm chart sets updateStrategy: OnDelete on the StatefulSet by default. This means ArgoCD applying a new chart revision stages the change in a new controllerRevision but does not restart any pods. You have to manually delete each pod to trigger the rollout.

The danger is that multiple staged-but-unrolled revisions can accumulate. If you pushed three separate changes without rolling, deleting a pod applies all three at once. Always check oc get statefulset vault -o json | jq '.status | {currentRevision, updateRevision}' before starting a rollout.

The Voter vs. Follower Distinction

When a Vault pod restarts and rejoins the Raft cluster, it goes through a stabilization window (default 10 seconds) before Raft’s autopilot promotes it to a full voter. During this window, the pod is a non-voter — it’s replicating and appears healthy, but it doesn’t count toward quorum.

I learned this the hard way. A 3-node cluster needs 2 voters for quorum. I bounced pod 0, saw it come up 1/1 Ready with replication.standby: stream started, assumed it was healthy, and immediately deleted pod 1. Pod 1 went down before pod 0 achieved voter status, dropping the active voter count to 1 — below quorum — causing a brief write outage until things stabilized.

1/1 Ready does not mean voter. Always verify:

# Check who the current leader is (unauthenticated)
oc -n vault exec vault-0 -- \
  wget -qO- http://127.0.0.1:8200/v1/sys/leader | jq .

# Verify the just-rolled pod is a voter before touching the next
VAULT_TOKEN=<root-token> vault operator raft list-peers

Look for Voter: true on the pod you just rolled before deleting the next one.

The Correct Rollout Order

  1. Identify the current leader at rollout time (don’t assume it’s pod 0 — it could have changed since the last restart)
  2. Delete a follower and wait for 1/1 Ready and Voter: true in raft list-peers
  3. Repeat for the remaining follower
  4. Delete the leader last — it triggers re-election to one of the now-healthy standbys before the leader pod restarts

This sequence ensures quorum is never threatened.


Challenge 4: VSO Reconciliation Bugs

I’m using Vault Secrets Operator (VSO) 1.4.0 to sync Vault secrets into Kubernetes Secrets. It hit two distinct bugs during this deployment.

Bug 1: Static Database Credentials Not Refreshing

Vault was rotating the Terraform Enterprise database password weekly via the database secrets engine’s static role. VSO was supposed to detect the rotation and update the Kubernetes Secret — but it wasn’t. TFE pods started failing with SQLSTATE 28P01 (authentication failure) after each rotation.

VSO’s logs showed the reason:

Vault secret does not support periodic renewal/refresh via reconciliation
horizon: 0

VSO 1.4.0 doesn’t treat static-creds responses as renewable by default. The fix is one field on the VaultDynamicSecret:

spec:
  allowStaticCreds: true
  refreshAfter: 24h

allowStaticCreds: true tells VSO to use the static-creds rotation tracker and honor the rotation_period configured in Vault. refreshAfter is a safety net in case the rotation tracker misses an event.

Bug 2: VSO Stops Reconciling After a Transient Vault HA Event

After Vault HA events (leader re-election, pod restarts), VSO occasionally emits 500 errors like:

local node not active but active cluster node not found

This is transient and expected — it resolves once the new leader is fully established. The problem is that VSO 1.4.0 records the failure and then stops requeuing the affected VaultStaticSecret. Days later, the secret is stale and VSO’s controller shows zero log entries for it since the failure event.

Restarting the VSO controller pod alone is not sufficient — it re-initializes but doesn’t automatically re-pick stuck resources.

The fix is to touch the VSS spec to force a requeue:

# Temporarily change refreshAfter to trigger the watcher
oc -n <namespace> patch vaultstaticsecret <name> \
  --type=merge -p '{"spec":{"refreshAfter":"6m"}}'

# Verify reconciliation fires, then revert to original value
oc -n <namespace> patch vaultstaticsecret <name> \
  --type=merge -p '{"spec":{"refreshAfter":"5m"}}'

Note: the vso.secrets.hashicorp.com/refresh annotation documented for triggering immediate reconciliation works for VaultDynamicSecret but not VaultStaticSecret. Only the spec-patch method reliably works for the latter.


Conclusion

Running Vault on OpenShift is entirely doable and production-viable, but the combination of OpenShift’s SCC model, Vault’s Raft consensus requirements, and VSO’s edge cases means there are more failure modes than the documentation covers. The main takeaways:

  • IPC_LOCK needs an explicit SCC — and the securityContext.capabilities.add hint is required to steer SCC selection when multiple SCCs have equal priority
  • Qualify your image referencesdocker.io/hashicorp/vault-enterprise:2.0.1-ent is unambiguous; bare hashicorp/vault-enterprise is not
  • Vault Agent beats curl+bash — one-shot mode gives you declarative auth, structured failure logging, and no shell fragility
  • 1/1 Ready is not Voter: true — wait for raft list-peers before rolling the next pod
  • VSO 1.4.0 has a stuck-VSS bug — know the spec-patch workaround before you need it in production

Up Next: Terraform Enterprise on OpenShift

The same lab cluster runs Terraform Enterprise in active-active mode alongside Vault, and it has its own set of OpenShift-specific challenges worth a dedicated post. The short preview:

  • PostgreSQL via CloudNativePG — TFE needs a PostgreSQL cluster with a specific schema owner and connection string format. Getting the VSO-managed credentials to rotate cleanly with TFE’s connection pool is where the allowStaticCreds fix from Challenge 4 above actually first surfaced.
  • NooBaa S3 for object storage — TFE uses S3-compatible storage for run logs and state. OpenShift ships with NooBaa (via ODF) as an in-cluster S3 provider, which avoids a cloud dependency but requires matching TFE’s expected bucket configuration and path-style addressing.
  • Redis for active-active coordination — TFE’s active-active mode uses Redis as its coordination layer. Running Redis on OpenShift under restricted-v2 has its own SCC story.
  • The same VSO stuck-reconciliation bug — hits TFE’s Redis and S3 secrets independently after any Vault HA event, so the spec-patch workaround becomes a standard part of the Vault rollout runbook.

If you’re planning a full HashiCorp stack on OpenShift, the Vault work above is the prerequisite — get Vault stable and auto-unsealing before bringing TFE up, since TFE uses Vault as its secrets backend from day one.