Kubernetes Monitoring

<< Back to Technical Glossary

Kubernetes Monitoring Definition

Kubernetes monitoring is a type of reporting that helps identify problems in a Kubernetes cluster and implement proactive cluster management strategies. Kubernetes cluster monitoring tracks cluster resource utilization, including storage, CPU and memory. This eases containerized infrastructure management. Many organizations go beyond this inherent monitoring functionality to gain full visibility over cluster activity with full suites of cloud-native monitoring tools.

This image depicts kubernetes monitoring software tracking cluster resource utilization, including storage, CPU and memory.

FAQs

What is Kubernetes Monitoring?

Kubernetes requires distinct approaches to monitoring traditional, long-lived hosts such as physical machines and VMs. A Kubernetes-based architecture’s inherent abstraction offers a framework for comprehensive application monitoring in a dynamic container environment. By tailoring a monitoring approach to complement the built-in abstractions of a Kubernetes system, comprehensive insights into application performance and health are possible, despite the constant motion between the containers running the applications.

Kubernetes container monitoring differs from traditional monitoring of more static resources in several ways.

Kubernetes identifies which services and pods belong together using labels. A container environment tracks even larger numbers of objects with even shorter lifespans. Labels are the only reliable way to identify and track applications and the pods they are in, thanks to the scalability and automation inherent to a Kubernetes system.

Aggregate data with labels from containers and pods to get continuous visibility into Kubernetes objects such as services. Label pods to connect events and metrics to the various layers of Kubernetes architecture and keep your observability data more actionable.

Another difference between traditional, host-centric infrastructure and Kubernetes architecture is the additional layers of abstraction that are part of K8s systems. More abstraction means additional components to monitor.

Older systems presented two main layers to monitor: hosts and applications. Containers added a layer of abstraction between applications and hosts. Kubernetes, yet another layer of comprehensive infrastructure, orchestrates containers and also requires monitoring.

Thus, four distinct components, each with its own challenges, is part of Kubernetes application monitoring:

  • Hosts, regardless of which applications/containers they are running
  • Containers, wherever they are running
  • Containerized applications
  • The entire Kubernetes cluster

Additionally, since applications are always in motion and highly distributed to monitor the health of your Kubernetes infrastructure, it’s essential to collect metrics and events from all your containers and pods, including the applications actually running in them.

Kubernetes schedules workloads automatically and things move rapidly, so users typically have very little control over where applications are running. (Users can assign node affinity or anti-affinity to particular Kubernetes pods, but to benefit most from its automatic resource management and scheduling, most users delegate that control to Kubernetes.)

It’s not possible to manually configure checks to collect monitoring data from applications upon each start or restart given the rate of change in a typical Kubernetes cluster. Kubernetes monitoring tools with service discovery enable users to maximize the inherent automation and scalability of Kubernetes without sacrificing visibility.

Even as containerized workloads contract, expand, or shift across hosts, service discovery enables continuous monitoring. Service discovery in Kubernetes automatically re-configures the data collection and enables the Kubernetes monitoring system to detect any change in the inventory of pods running.

Kubernetes Metrics Monitoring

Find important Kubernetes monitoring metrics using the Kubernetes Metrics Server. Kubernetes Metrics Server collects and aggregates data from the kubelet on each node. Consider some of these key Kubernetes metrics:

  • API request latency, the lower the better, measured in milliseconds
  • Cluster state metrics, including the availability and health of pods
  • CPU utilization in relation to per pod CPU resource allocation
  • Disk utilization including lack of space for file system and index nodes
  • Memory utilization at the node and pod levels
  • Node status, including disk or processor overload, memory, network availability, and readiness
  • Pod availability (unavailable pods may indicate poorly designed readiness probes or configuration issues)

The Need: How to Monitor Kubernetes

At the enterprise level, containers have experienced explosive growth. Kubernetes in business offers many benefits to DevSecOps, developers, and IT teams. However, deploying containerized applications with Kubernetes delivers scalability and flexibility that are themselves a challenge.

Servers and applications are no longer correlated at a 1-to-1 ratio. Applications are abstracted more than once, by containers and by Kubernetes, so tracking application health without the proper tools is impossible. Here are some Kubernetes monitoring best practices to keep in mind.

Monitoring Kubernetes cluster nodes. Acquire a broad view of overall platform capacity and health by monitoring the Kubernetes cluster. Monitor cluster resource and infrastructure usage to determine whether the cluster is underutilized or over capacity. Node health and availability are important to monitor to reveal whether there are sufficient resources and nodes available to replicate applications. Finally, monitor chargeback or resource usage for each project and/or team.

Monitoring Kubernetes deployments and pods. Monitoring Kubernetes constructs such as deployments, namespaces, DaemonSets or ReplicaSets, ensures proper application deployment. Monitor failed and missing pods to determine how many pods fail and whether the pods are running for each application. Watch pod resource usage vs limits and requests to confirm that memory and CPU limits and requests are set and compare those with actual usage. And monitor running vs desired instances, specifically, how many instances for each service do you expect to be ready, and how many are actually ready?

Monitoring Kubernetes applications. This is more familiar monitoring for application availability and confirming that the application is responding. It also includes monitoring application health and performance. How many requests are there? Are there errors? Measure latency and responsiveness as well.

Monitoring Kubernetes containers. Monitoring tools rely on services as their endpoint because pods and their containers are dynamically scheduled and in constant motion. Even as individual pods and containers are created and deleted, services can communicate continually because services expose an IP address that can be accessed externally.

Monitoring Kubernetes pod health. The three metrics that touch upon Kubernetes pod health are the liveness, readiness, and startup condition probes. They are determined and managed by the kubelet.

The liveness probe helps identify when pods have become unresponsive and determine if a container within a pod needs to restart.

Only when all of the containers in a pod are ready is the pod itself ready. Pods that are not ready will not receive incoming traffic and will be removed from service load balancers. The readiness probes tell the cluster when pod containers are ready to start processing traffic.

The startup probe indicates if/when the application in the pod successfully starts. Both readiness and liveness probes are deactivated in the presence of a startup probe until the latter ensures the startup succeeds without interference from other probes.

Kubernetes Monitoring Tools

In other words, what is Kubernetes health and how is it monitored?

How to monitor Kubernetes nodes. The health of Kubernetes nodes directly affects their ability to run their assigned pods. The Kubernetes problem detector DaemonSet aggregates and sends data to the API server on problems from node metrics daemons reported as node events and conditions.

Kubernetes users often use open source tools that are deployed inside Kubernetes as monitoring solutions. These include Heapster/InfluxDB/Grafana and Prometheus/Grafana. It’s also possible to conduct Kubernetes monitoring with ELK Stack or a hosted solution (Heapster/ELK). Finally, proprietary APM solutions that offer Kubernetes monitoring are also on the market. Depending on your organization’s needs, an open source Kubernetes monitoring solution might be best, or a proprietary or hosted solution might have its benefits.

Here are some of the more common tools for Kubernetes monitoring.

Prometheus metrics. Prometheus is an open source system created by the Cloud Native Computing Foundation (CNCF). The Prometheus server collects data from nodes, pods, and jobs, and other Kubernetes health metrics after installing data exporter pods on each node in the cluster. It saves collected time-series data into a database, and generates alerts automatically based on preset conditions.

The Prometheus dashboard is limited, but users enhance it with external visualization tools such as Grafana, which enables customized and sophisticated debugging, inquiries, and reporting using the Prometheus database. Prometheus supports importing data from many third-party databases.

Kubernetes dashboard. The Kubernetes dashboard is a simple web interface for debugging containerized applications and managing cluster resources. The Kubernetes dashboard provides a rundown of all defined storage classes and all cluster namespaces, and a simple overview of resources, both on individual nodes and cluster-wide.

Kubernetes dashboard Admin view lists all the nodes and aggregated metrics for each along with persistent storage volumes. Config and storage view identifies persistent volume claims for all the Kubernetes resources running in the cluster and each clustered application. Workload view lists every running application by namespace, including the number of pods currently ready in a Deployment and current pod memory usage. And Discover view lists exposed services that have enabled discovery inside the cluster.

cAdvisor. cAdvisor collects metrics on historical data, resource usage, and resource isolation, from the cluster to the container.

Kubernetes applications always run in pods, so their health can be measured by the readiness and liveness probes in the pods. If applications are running on nodes that are not reporting any errors, and the applications themselves report they are ready to process new requests, the applications are probably healthy.

Why are Kubernetes Monitoring Tools Important?

Legacy monitoring tools fail at monitoring Kubernetes for several reasons. They are designed for monitoring known servers that didn’t change rapidly, and they focus on collecting metrics from static targets.

Kubernetes inherently increases the complexity of infrastructure. Any sort of platform that sits between application empowering services and other infrastructure such as Kubernetes demand monitoring.

Along with increasingly complex infrastructure, modern microservices applications massively increase the number of components communicating with each other. Containers migrate across infrastructure as needed, and each service can be distributed across multiple instances. Thus, to understand whether Kubernetes is working, it is essential to monitor the Kubernetes orchestration state and verify that all instances of the service are running.

The explosion in cloud-native architectures means a correlating explosion in scale requirements. Kubernetes monitoring tooling and methodology must retain enough granularity to inspect individual components while alerting users of high-level service objectives.

Traditional monitoring cannot manage the number of metrics generated by cloud-native architectures. In the past we knew where and how many of each instance there was of a service component, but Kubernetes adds multidimensionality, so the various perspectives or aggregations that must be managed can quickly spiral out of control.

Containers are transient; in fact over half last just a few minutes. This high level of churn means thousands of data points, and hundreds of thousands of time series, even in a small Kubernetes cluster. The best Kubernetes monitoring solutions must be capable of scaling to hundreds of thousands of metrics.

Finally, It’s difficult to see inside containers, and they are ephemeral. This makes them naturally tough to troubleshoot, blackboxes by design, in some sense. Monitoring tools for Kubernetes that offer granular visibility allow for more rapid troubleshooting.

Does VMware NSX Advanced Load Balancer Offer Kubernetes Monitoring Services?

Yes. VMware NSX Advanced Load Balancer provides a centrally orchestrated, elastic proxy services fabric with dynamic load balancing, ingress controller, service discovery, application security, and analytics for containerized applications running in Kubernetes environments.

Kubernetes monitoring demands a cloud-native approach which VMware NSX Advanced Load Balancer provides. The VMware NSX Advanced Load Balancer delivers scalable, enterprise-class container ingress to deploy and manage container-based applications in production environments accessing Kubernetes clusters. VMware NSX Advanced Load Balancer provides a container services fabric with a centralized control plane and distributed proxies:

  • Controller: A central control, management and analytics plane that communicates with the Kubernetes primary node. The Controller includes two sub-components called Kubernetes Operator (AKO) and Multi-Cloud Kubernetes Operator (AMKO), which orchestrate all interactions with the Kube-controller-manager. AKO is used for ingress services in each Kubenetes cluster and AMKO is used in the context of multiple clusters, sites, or across clouds. The Controller deploys and manages the lifecycle of data plane proxies, configures services and aggregates telemetry analytics from the Service Engines.
  • Service Engine: A service proxy providing ingress services such as load balancing, WAF, GSLB, IPAM/DNS in the dataplane and reporting real-time telemetry analytics to the Controller.

For more on the actual implementation of load balancing, security applications and web application firewalls check out our Application Delivery How-To Videos.

Find out more about how VMware NSX Advanced Load Balancer’s cloud-native approach for traffic management and application networking services can assist your organization’s Kubernetes monitoring here.