Cloud Infrastructure

Kubernetes in Production: Security, Monitoring, and Cost Optimization

March 20, 2025
14 min read
By DevOps Team

Production-hardened Kubernetes deployment strategies covering service mesh, observability, auto-scaling, and infrastructure-as-code best practices.

Production Kubernetes: Beyond the Tutorial

Running Kubernetes in production is vastly different from local development. This guide covers the security, monitoring, cost optimization, and operational practices that separate toy clusters from enterprise-grade infrastructure.

  • Security hardening (RBAC, policies, secrets)
  • Monitoring and observability (metrics, logs, traces)
  • Cost optimization (right-sizing, autoscaling, spot instances)
  • Disaster recovery and high availability
  • GitOps and deployment strategies

Security Hardening

YAML
# Production Security Configuration

# 1. Network Policies - Zero Trust Networking
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

# 2. Pod Security Standards - Restrict privileges
apiVersion: policy/v1beta1
kind:PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true

# 3. RBAC - Least Privilege Access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: developer-role
rules:
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "deployments", "jobs"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get"]
# Note: No delete, no secrets access

# 4. Secrets Management with External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: api-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-url
    remoteRef:
      key: prod/api/database-url
  - secretKey: api-key
    remoteRef:
      key: prod/api/key

# 5. Resource Limits - Prevent noisy neighbors
apiVersion: v1
kind: LimitRange
metadata:
  name: resource-limits
  namespace: production
spec:
  limits:
  - max:
      cpu: "2"
      memory: 2Gi
    min:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 200m
      memory: 256Mi
    type: Container

Monitoring and Observability

The Three Pillars of Observability:

  1. 1.Metrics (Prometheus + Grafana)
  1. 2.Logs (Loki + Promtail)
  1. 3.Traces (Tempo + OpenTelemetry)
  • Cluster Level: Node CPU/memory, pod count, PVC usage
  • Application Level: Request rate, error rate, latency (p50, p95, p99)
  • Business Level: Sign-ups, transactions, revenue
  • Cost: Resource utilization, waste, spot instance savings
  • Only alert on actionable issues
  • Use runbooks for all alerts
  • Escalation policies (PagerDuty, Opsgenie)
  • Alert grouping and deduplication
KubernetesDevOpsSecurityProductionInfrastructure

Need Expert Help?

Our team has extensive experience implementing solutions like this. Let's discuss your project.