Resource optimization for AKS node pools and autoscalers.

Agent-driven optimization of pods, storage, autoscalers, and node fleets for AKS clusters backed by VMSS-based node pools or Karpenter — landing as Karpenter NodePool, Bicep / Terraform, or GitOps diffs.

Vertical Pod Scaling

Request and limit specifications, kept in sync with workload reality.

On AKS, Azure Advisor produces sizing guidance and the open-source VPA can surface request recommendations — but neither closes the loop, so recommendations accumulate and rarely land. Kubex applies them continuously via mutating webhooks and in-place resize, leaving AKS-managed namespaces untouched.

Continuous request right-sizing

Tuned from learned utilization, freeing capacity the cluster autoscaler holds in reserve.

Limits prevent OOM and throttling

Shaped to actual peak behaviour, not template defaults.

Predictive scaling and new-workload sizing

Predictive Pod Scaler resizes ahead of learned patterns; Container Deployment Sizer drafts new specs via MCP.

Ephemeral Storage

Local storage requests aligned to actual disk pressure.

Ephemeral storage is the resource discovered during incidents. On AKS, under-spec’d ephemeral-storage triggers disk-pressure evictions; over-spec’d caps pod density. Kubex tracks usage and adjusts requests via the same in-place path as CPU and memory.

Pressure-driven scheduling stays accurate

Requests reflect real disk consumption, not worst-case guesses.

Capacity restoration

Right-sized requests release headroom held against worst-case usage.

Disk-pressure evictions eliminated upstream

Requests track growth, so disk-pressure conditions never form.

HPA Optimization

Horizontal autoscaling, configured from how the workload behaves.

On AKS, HPA (often paired with KEDA for event-sourced scaling) carries elasticity for most workloads, but keeping it correct is hard — thresholds inherit from templates, policies stay default, HPAs outlive their pod sizing. The HPA Optimizer recomputes thresholds, scale policies, and replica bounds against today’s workload.

Thresholds re-anchored after pod sizing

Recomputed when right-sizing shifts the request denominator.

Scale policies tuned to reaction time

Against observed behaviour, not Helm-chart defaults.

OOM and throttling shielded

Flags HPA settings that let pods hit throttling or OOM before scale-out.

Node Optimization

Node fleets that match the workload they actually run.

AKS node-pool specs drift from workload reality once pod sizing changes land. Kubex addresses this in two modes: simulation-based VM-SKU and scale-set parameter recommendations for VMSS-backed pools (via Terraform, az CLI, or GitOps), and NodePool specs via the Karpenter Optimizer on AKS (NAP) — with spot, ARM, and GPU candidates included.

Output is the artifact, not a recommendation

VMSS-backed node-pool specs, NodePools, or Terraform / Bicep diffs through existing change-management.

Aware of both AKS autoscaler modes

VMSS-backed node pools and Karpenter (NAP) each get the right primitive — VMSS scale-set parameters or NodePool requirements.

Continuous re-evaluation as pod sizing evolves

Recompute as pod requests change.

Node Bin Packing

Higher pod density, with safety bounds that keep it usable.

On AKS, the cluster autoscaler and the scheduler’s bin-packing strategies (MostAllocated, RequestedToCapacityRatio) under-pack by default. Tuning them before right-sizing pods is the failure mode — overstacking, throttling, OOM. The Bin Packer ties density to pod-sizing maturity, raising it as sizing stabilises.

Max-pods and strategy per node type

Aligned to actual pod profile — MostAllocated / RequestedToCapacityRatio.

Consolidation thresholds move with pod-sizing maturity

AKS cluster-autoscaler scale-down delay and Karpenter consolidationPolicy auto-tuned from observed pod-sizing accuracy.

Per-pool consolidation profiles

System, GPU, and general pools each get their own aggressiveness — density gains don't churn pools that need to stay stable.

Node Pre-Warming

Capacity initialized before the load curve hits.

Reactive autoscaling adds nodes after pressure arrives — paid every day on daily-cyclical workloads. The Node Prewarmer provisions ahead of forecast from Kubex’s pattern models. Leverage peaks on GPU inference — CUDA pulls and model load dominate cold starts.

Predictive scheduling against learned patterns

Runs ahead of the daily load cycle, not after pressure.

GPU-aware pre-warming

CUDA pulls and model load accounted for, so inference SLOs aren't paid in warm-up.

Coordinated with bin packing

Pre-warm respects consolidation thresholds, so headroom doesn't fight stable-load density.

GPU Optimization

Inference and training, sized to the right GPU.

GPU workloads bring decisions CPU tooling doesn’t make — sharing strategy, partitioning, and SKU. On AKS Kubex covers all of it: time-slicing via NVIDIA KAI, MIG on Ampere/Hopper/Blackwell, SKU selection, and cross-provider analysis across Azure GPU SKUs (NC, ND, NV families), neoclouds, and adjacent CSPs.

Per-workload sharing strategy

MIG, time-slicing, or MPS — by isolation, flexibility, or memory profile.

SKU selection includes provider economics

Factor in benchmarks, availability, and pricing — not the local default.

Cross-provider price/performance

Workloads evaluated across CSPs, neoclouds, and on-prem — comparison, not auto-move.

See how Kubex looks against your AKS cluster.

A walk-through of the agent surface and change-management flow — on a cluster you actually run.