Kubernetes Cost-Saving Secrets: A 50% Workload Cost Reduction Story

In my current role, we encountered significant latency issues in our API responses during peak traffic. Upon investigation, we identified that the bottleneck was the system's inability to handle the increased load efficiently. To address this, I implemented Karpenter, an open-source Kubernetes cluster autoscaler, to dynamically scale nodes based on workload demands. This solution not only resolved the latency issue by ensuring sufficient resources during high-traffic periods but also optimized resource usage, leading to significant cost savings during low-traffic times. What is Karpenter? Karpenter is a CNCF (Cloud Native Computing Foundation) project designed to dynamically provision and scale Kubernetes nodes based on workload demands. How Karpenter Works  Karpenter uses Custom Resource Definitions (CRDs) and cloud provider APIs to dynamically provision and scale nodes in Kubernetes clusters. Here's a step-by-step explanation of how Karpenter operates, illustrated with configuration examples: Custom Resource Definitions (CRDs) are a powerful tool in Kubernetes that allows you to extend the Kubernetes API by defining your own resources. This flexibility enables complex automation, customized workloads. 1. Installation Install Karpenter using Helm or YAML manifests. For example: helm repo add karpenter https://charts.karpenter.sh helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace \ --set controller.clusterName= \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"= \ --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile \ --set settings.aws.clusterEndpoint= 2. Provisioner Configuration The core of Karpenter's functionality lies in the Provisioner. This CRD defines scaling policies, instance types, zones, and other preferences. Here's an example configuration: Provisioner YAML apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: # Limit the maximum number of nodes Karpenter can provision limits: resources: cpu: 1000 # Define the node lifecycle type requirements: - key: "karpenter.k8s.aws/instance-type" operator: In values: ["m5.large", "m5.xlarge"] - key: "topology.kubernetes.io/zone" operator: In values: ["us-east-1a", "us-east-1b"] provider: instanceProfile: "KarpenterNodeInstanceProfile" subnetSelector: kubernetes.io/cluster/: "owned" securityGroupSelector: karpenter.sh/discovery: "" ttlSecondsAfterEmpty: 30 Key Configurations: limits.resources: Sets resource limits for scaling (e.g., maximum CPUs). requirements: Specifies node preferences, such as instance types or zones. provider: Configures AWS-specific settings like subnets and security groups. ttlSecondsAfterEmpty: Automatically terminates idle nodes after 30 seconds. 3. Triggering Node Provisioning Karpenter observes unschedulable pods in the cluster. For example: Pod Spec apiVersion: v1 kind: Pod metadata: name: compute-intensive spec: containers: - name: busybox image: busybox resources: requests: memory: "512Mi" cpu: "1" When this pod cannot be scheduled due to insufficient resources, Karpenter: Detects the event. Matches the pod requirements with the Provisioner configuration. Launches a new node that meets the criteria (e.g., m5.large in us-east-1a). 4. Scaling Down Idle Nodes Karpenter continuously monitors cluster utilization. When nodes are no longer required: It consolidates workloads to fewer nodes. Terminates underutilized nodes based on ttlSecondsAfterEmpty or custom policies. 5. Observing Metrics and Logs Monitor Karpenter using tools like Prometheus or CloudWatch. Example commands: Check node provisioning: kubectl get nodes View Karpenter logs: kubectl logs -n karpenter deploy/karpenter-controller Conclusion Karpenter simplifies dynamic scaling in Kubernetes clusters. By using real-time configuration files, it can: Match workload demands. Optimize resource usage. Reduce costs. Minimize operational overhead. Its flexibility allows you to adapt quickly to changing application requirements, ensuring high availability and performance.

Jan 16, 2025 - 05:30
Kubernetes Cost-Saving Secrets: A 50% Workload Cost Reduction Story

Kubernetes scalling and cost optimization

In my current role, we encountered significant latency issues in our API responses during peak traffic. Upon investigation, we identified that the bottleneck was the system's inability to handle the increased load efficiently.
To address this, I implemented Karpenter, an open-source Kubernetes cluster autoscaler, to dynamically scale nodes based on workload demands. This solution not only resolved the latency issue by ensuring sufficient resources during high-traffic periods but also optimized resource usage, leading to significant cost savings during low-traffic times.

What is Karpenter?
Karpenter is a CNCF (Cloud Native Computing Foundation) project designed to dynamically provision and scale Kubernetes nodes based on workload demands.

How Karpenter Works 

Karpenter uses Custom Resource Definitions (CRDs) and cloud provider APIs to dynamically provision and scale nodes in Kubernetes clusters. Here's a step-by-step explanation of how Karpenter operates, illustrated with configuration examples:
Custom Resource Definitions (CRDs) are a powerful tool in Kubernetes that allows you to extend the Kubernetes API by defining your own resources. This flexibility enables complex automation, customized workloads.

1. Installation

Install Karpenter using Helm or YAML manifests. For example:

helm repo add karpenter https://charts.karpenter.sh
helm install karpenter karpenter/karpenter --namespace karpenter --create-namespace \
  --set controller.clusterName= \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"= \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile \
  --set settings.aws.clusterEndpoint=

2. Provisioner Configuration

The core of Karpenter's functionality lies in the Provisioner. This CRD defines scaling policies, instance types, zones, and other preferences. Here's an example configuration:
Provisioner YAML

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  # Limit the maximum number of nodes Karpenter can provision
  limits:
    resources:
      cpu: 1000
  # Define the node lifecycle type
  requirements:
    - key: "karpenter.k8s.aws/instance-type"
      operator: In
      values: ["m5.large", "m5.xlarge"]
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["us-east-1a", "us-east-1b"]
  provider:
    instanceProfile: "KarpenterNodeInstanceProfile"
    subnetSelector:
      kubernetes.io/cluster/: "owned"
    securityGroupSelector:
      karpenter.sh/discovery: ""
  ttlSecondsAfterEmpty: 30

Key Configurations:

  1. limits.resources: Sets resource limits for scaling (e.g., maximum CPUs).
  2. requirements: Specifies node preferences, such as instance types or zones.
  3. provider: Configures AWS-specific settings like subnets and security groups.
  4. ttlSecondsAfterEmpty: Automatically terminates idle nodes after 30 seconds.

3. Triggering Node Provisioning

Karpenter observes unschedulable pods in the cluster. For example:
Pod Spec

apiVersion: v1
kind: Pod
metadata:
  name: compute-intensive
spec:
  containers:
    - name: busybox
      image: busybox
      resources:
        requests:
          memory: "512Mi"
          cpu: "1"

When this pod cannot be scheduled due to insufficient resources, Karpenter:

  1. Detects the event.
  2. Matches the pod requirements with the Provisioner configuration.
  3. Launches a new node that meets the criteria (e.g., m5.large in us-east-1a).

4. Scaling Down Idle Nodes

  1. Karpenter continuously monitors cluster utilization. When nodes are no longer required:
  2. It consolidates workloads to fewer nodes.
  3. Terminates underutilized nodes based on ttlSecondsAfterEmpty or custom policies.

5. Observing Metrics and Logs

Monitor Karpenter using tools like Prometheus or CloudWatch. Example commands:
Check node provisioning:
kubectl get nodes

View Karpenter logs:
kubectl logs -n karpenter deploy/karpenter-controller

Conclusion

Karpenter simplifies dynamic scaling in Kubernetes clusters. By using real-time configuration files, it can:
Match workload demands.
Optimize resource usage.
Reduce costs.
Minimize operational overhead.

Its flexibility allows you to adapt quickly to changing application requirements, ensuring high availability and performance.