Kubernetes (often abbreviated as K8s) is an open-source container orchestration platform originally developed by Google and donated to the CNCF in 2014. It automates the deployment, scaling, and management of containerized applications across clusters of machines.
Running containers at scale across many machines introduces complex challenges: How do you ensure containers restart if they crash? How do you distribute traffic, roll out updates without downtime, or scale under load? Kubernetes acts as a control plane that continuously reconciles the actual state of your infrastructure with a desired state you declare. This declarative model eliminates large amounts of manual operational work.
A Kubernetes cluster consists of a Control Plane and one or more Worker Nodes. The Control Plane includes the API Server (the central communication hub), etcd (a distributed key-value store holding all cluster state), the Scheduler (assigns workloads to nodes), and the Controller Manager (runs reconciliation loops). Worker Nodes run the kubelet agent, a container runtime like containerd, and kube-proxy for networking.
Pods are the smallest deployable unit in Kubernetes, wrapping one or more containers that share a network namespace and storage. Deployments manage a desired number of identical Pod replicas and handle rolling updates. Services provide a stable DNS name and IP address to route traffic to a dynamic set of Pods, while Namespaces logically isolate resources within the same cluster.
When you apply a manifest, the API Server stores it in etcd and the Scheduler selects an appropriate Node based on resource requests, taints, tolerations, and affinity rules. The kubelet on that Node pulls the container image and starts the Pod. If a Pod crashes or a Node goes offline, the Controller Manager detects the drift from desired state and reschedules the affected Pods automatically.
Every container should declare CPU and memory requests (what is guaranteed) and limits (the maximum allowed). Without requests, the Scheduler cannot make informed placement decisions, leading to over-scheduled nodes, OOMKilled containers, and unpredictable performance. Setting limits too low causes unnecessary throttling, while omitting them entirely risks one noisy container starving others on the same node.
Liveness probes tell Kubernetes when to restart a container that has entered a broken state, while Readiness probes signal when a container is ready to receive traffic. Configuring both probes prevents Kubernetes from routing requests to a Pod that has started but whose application is still initializing or has deadlocked. Always tune probe thresholds carefully — overly aggressive probes cause unnecessary restarts and instability.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app