We’ve lived through two Kubernetes ages already. First came survival: get containers onto nodes and keep them alive. Then came scale: stretch clusters to thousands of nodes and millions of pods.

Scale, however, exposed the seams. Real infrastructure is messy. In 2025—with AI jobs everywhere—those seams rip open at 3 a.m. when a training run dies because the scheduler ignores GPU topology. Finance flags a bill spike from cross-zone egress nobody meant to pay for. Autoscalers yo-yo on noisy signals.

Kubernetes 1.34 (“Of Wind & Will”) is the turn toward production reality. Instead of new layers, it sharpens the layers we already have—making them hardware-, network-, and security-aware—so the platform’s assumptions finally match the operator’s world.

Where Project DevOps fits: Kubernetes 1.34 hands you smarter primitives; Project DevOps closes the loop—turning kernel and device signals (PSI, DRA, etc.) into continuous rightsizing, topology-aware placement, and calm, cost-sane autoscaling.

Advancements in Resource Intelligence

Until now, Kubernetes often treated diverse hardware like a single commodity pool. 1.34 changes that with features that understand the shape of resources, not just their counts.

1) Dynamic Resource Allocation (DRA) — GA

Problem: nvidia.com/gpu: 1 can’t express which 1—A100 vs. L4, with or without NVLink—so pods misplace or fail to bind.

What’s new: With DRA, you declare ResourceClaims against DeviceClasses. The scheduler becomes topology-aware and matches claims to concrete devices, not opaque integers. Allocations are AllocateOnce and immutable, so ML/HPC jobs get predictable access.

Prereqs:

Turn on DynamicResourceAllocation on apiserver, controller-manager, scheduler, and kubelet.
Deploy a DRA-capable device plugin (e.g., NVIDIA GPU Operator with DRA support). Without it, claims won’t bind and pods won’t schedule.

Why it matters: Tuned GPU placement can take utilization from ~30% to ~60%—more throughput per dollar.

Example ResourceClaim for A100s with NVLink

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: ml-training-claim
spec:
  devices:
    requests:
      - name: gpu-req
        exactly:
          deviceClassName: nvidia-gpu
          allocationMode: ExactCount
          count: 2
          selectors:
            - cel:
                expression: |
                  device.capacity["nvidia.com"].memory.compareTo(quantity("40Gi")) >= 0 &&
                  device.attributes["nvidia.com"].nvlink == "true" &&
                  device.attributes["nvidia.com"].product == "A100"

2) Swap Memory Support — Beta

Sudden memory spikes cause OOMKills. Swap support arrives—with guardrails:

Guaranteed pods: never swapped (latency-critical).
BestEffort pods: still no swap (no fair share basis).
Burstable pods: can use swap proportional to their requests.

Enablement: configure OS swap first, then kubelet:

# /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
memorySwap:
  swapBehavior: LimitedSwap

Ops tips: start with a small, tainted node pool; watch node_swap_usage_bytes / pod_swap_usage_bytes; keep system daemons off swap (memory.swap.max=0 for system.slice); control plane should remain swap-free. Think of swap as an airbag, not extra RAM.

3) Pressure Stall Information (PSI) — Beta

CPU/memory “utilization OK” while users feel lag? The app isn’t busy—it’s stalled. PSI (from Linux 4.20) measures time spent waiting on CPU, memory reclaim, or I/O, exposed per cgroup with cgroups v2.

In 1.34: enable KubeletPSI=true. PSI becomes a first-class signal you can export to Prometheus and surface via the Prometheus Adapter.

Example: HPA on memory stall time (external metric)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
# ...
spec:
  metrics:
  - type: External
    external:
      metric:
        name: psi_memory_stall_seconds_total
      target:
        type: AverageValue
        averageValue: "0.1"  # 100ms stalled per second

Caution: the work is in interpretation. Short I/O spikes shouldn’t trigger hour-long scale-ups. Project DevOps correlates PSI with historical patterns to distinguish blips from real capacity needs.

Networking That’s Cheaper and Smarter

trafficDistribution for Services — Beta, on by default

Multi-AZ clusters often hairpin traffic across zones, inflating tail latency and egress costs. The Service trafficDistribution setting (born as “PreferClose”) is now ready for prod.

PreferSameZone: prefer zone-local endpoints.
PreferSameNode: prefer node-local, then zone-local, then cluster-wide; great for per-node agents (DNS, log shippers, metrics).

If your dataplane doesn’t understand hints, it falls back to standard behavior—no breakage, just no locality preference.

Example: keep DNS node-local

apiVersion: v1
kind: Service
metadata:
  name: local-dns
spec:
  selector:
    app: dns
  ports:
  - port: 53
    targetPort: 53
    protocol: UDP
  trafficDistribution: PreferSameNode

Result: steadier P99s and lower inter-AZ bills—finally aligned with your wallet.

Security Built-In (Not Bolted-On)

4) Mutating Admission Policies — Beta

Webhooks worked, but added latency and a point of failure. Now you can mutate in-process using CEL—no sidecar service, no TLS to rotate.

Example: default Pods to non-root

apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingAdmissionPolicy
metadata:
  name: enforce-non-root
spec:
  matchConstraints:
    resourceRules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      operations: ["CREATE"]
      resources: ["pods"]
  matchConditions:
  - name: must-have-non-root
    expression: "object.spec.securityContext == null || object.spec.securityContext.runAsNonRoot != true"
  failurePolicy: Fail
  reinvocationPolicy: IfNeeded
  mutations:
  - patchType: ApplyConfiguration
    applyConfiguration:
      expression: >
        Object{
          spec: Object.spec{
            securityContext: Object.spec.securityContext{
              runAsNonRoot: true
            }
          }
        }
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingAdmissionPolicyBinding
metadata:
  name: enforce-non-root-binding
spec:
  policyName: enforce-non-root

Enable:
--feature-gates=MutatingAdmissionPolicy=true and --runtime-config=admissionregistration.k8s.io/v1beta1=true on the apiserver.

Use this for defaults (labels, topology keys, resource floors, sidecar stubs). Keep policy engines (Kyverno/Gatekeeper) for higher-order validation and compliance.

5) Selector-Based Authorization — GA

Kubelets used to need broad list permissions. Now the authorizer understands fieldSelectors and labelSelectors, so a kubelet can only list pods for its node.

Try it:

# Denied: unscoped list
kubectl auth can-i list pods --as=system:node:<node> --as-group=system:nodes

# Allowed: scoped to its node
kubectl auth can-i list pods --as=system:node:<node> --as-group=system:nodes \
  --field-selector spec.nodeName=<node>

Webhooks and CEL authorizer can also use selectors, enabling tighter least-privilege boundaries.

6) ServiceAccount Token Image Pulls — Beta

Static imagePullSecrets linger forever. With kubelet credential providers and ServiceAccount-scoped, short-lived tokens, pulls become ephemeral and auto-rotated.

Kubelet snippet:

# /etc/systemd/system/kubelet.service.d/10-credential-provider.conf
[Service]
Environment="KUBELET_EXTRA_ARGS=\
  --image-credential-provider-config=/etc/kubernetes/credential-provider-config.yaml \
  --image-credential-provider-bin-dir=/etc/kubernetes/credential-provider \
  --feature-gates=KubeletServiceAccountTokenForCredentialProviders=true"

Minimal provider config (AWS ECR):

# /etc/kubernetes/credential-provider-config.yaml
apiVersion: credentialprovider.kubelet.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
  matchImages:
  - "*.dkr.ecr.*.amazonaws.com"
  defaultCacheDuration: 12h
  apiVersion: credentialprovider.kubelet.k8s.io/v1
  args: ["get-credentials"]
  env:
  - name: AWS_REGION
    value: us-west-2

Drop the provider binary, restart kubelet, and you’ve traded static secrets for per-workload, expiring credentials.

7) Anonymous Auth, Narrowed — GA

Keep /healthz, /readyz, /livez, /version open to unauthenticated probes—lock everything else down with RBAC.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: unauthenticated-healthz
rules:
- nonResourceURLs: ["/healthz", "/readyz", "/livez", "/version"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: unauthenticated-healthz-binding
subjects:
- kind: Group
  name: system:unauthenticated
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: unauthenticated-healthz
  apiGroup: rbac.authorization.k8s.io

Operator & Developer Experience

8) VolumeAttributesClass — GA (live tuning, no remount)

Change PVC performance attributes in place (driver-dependent via CSI ModifyVolume). No detach, no eviction, no downtime.

apiVersion: storage.k8s.io/v1
kind: VolumeAttributesClass
metadata:
  name: gold-performance
driverName: csi.vendor.com
parameters:
  provisioned-iops: "5000"
  throughput: "200MiB/s"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: db-pvc
spec:
  storageClassName: fast-ssd
  volumeAttributesClassName: gold-performance
  resources:
    requests:
      storage: 100Gi

Start frugal, bump to “gold” during EoQ crunch, and drop back later—if your CSI supports it.

9) Job `podReplacementPolicy` — GA

Batch jobs used to double allocate during retries (one pod terminating, another starting). With podReplacementPolicy, the controller waits for the failed pod to actually disappear before creating a new one—smoother resource curves, fewer OOM alarms.

apiVersion: batch/v1
kind: Job
metadata:
  name: cleanup-job
spec:
  template:
    spec:
      containers:
      - name: cleanup
        image: busybox
        command: ["sh", "-c", "run-something"]
      restartPolicy: OnFailure
  podReplacementPolicy: Failed

Kubernetes Has Grown Up

Kubernetes 1.34 isn’t about “more magic.” It’s about closing gaps: acknowledging resource contention, topology quirks, cost traps, and security blast radii—and giving you direct, typed controls to manage them. DRA, PSI-driven decisions, node-local routing, in-process admission, and live storage tuning all aim at one thing: make the platform behave like production actually behaves.

That’s the third era: not survival, not just scale—production reality.

And getting value from these primitives is an ongoing practice. Signals like PSI and DRA matter only if they drive stable placement, right-sized requests, and non-thrashy autoscaling. Project DevOps does exactly that—turning these smarter building blocks into steady clusters, predictable P99s, and cloud bills that reflect efficiency rather than guesswork.

Visitor Count

ProjectDevOps

NGO partners

Upcoming platforms

Kubernetes’ Third Era: Production Reality

Advancements in Resource Intelligence

1) Dynamic Resource Allocation (DRA) — GA

2) Swap Memory Support — Beta

3) Pressure Stall Information (PSI) — Beta

Networking That’s Cheaper and Smarter

trafficDistribution for Services — Beta, on by default

Security Built-In (Not Bolted-On)

4) Mutating Admission Policies — Beta

5) Selector-Based Authorization — GA

6) ServiceAccount Token Image Pulls — Beta

7) Anonymous Auth, Narrowed — GA

Operator & Developer Experience

8) VolumeAttributesClass — GA (live tuning, no remount)

9) Job `podReplacementPolicy` — GA

Kubernetes Has Grown Up

Comments (0)

Related posts

Visitor Count

ProjectDevOps

NGO partners

Upcoming platforms

Advancements in Resource Intelligence

1) Dynamic Resource Allocation (DRA) — GA

2) Swap Memory Support — Beta

3) Pressure Stall Information (PSI) — Beta

Networking That’s Cheaper and Smarter

trafficDistribution for Services — Beta, on by default

Security Built-In (Not Bolted-On)

4) Mutating Admission Policies — Beta

5) Selector-Based Authorization — GA

6) ServiceAccount Token Image Pulls — Beta

7) Anonymous Auth, Narrowed — GA

Operator & Developer Experience

8) VolumeAttributesClass — GA (live tuning, no remount)

9) Job podReplacementPolicy — GA

Kubernetes Has Grown Up

Comments (0)

Related posts

9) Job `podReplacementPolicy` — GA