Kubernetes CPU Metrics in the kubeletstats Receiver: Transition from .cpu.utilization to .cpu.usage
The OpenTelemetry Collector’s
kubeletstats
receiver is a crucial component for collecting Kubernetes node, pod and
container metrics. To improve metric accuracy and adhere to
OpenTelemetry semantic conventions,
we are updating how CPU metrics are named and emitted.
This blog post explains the motivation behind this change, the impact on users, the role of the feature gate which was introduced for this change, and guidance on migrating.
Why This Change?
Historically, the
kubeletstats
receiver emitted CPU metrics labeled with .cpu.utilization
, such as:
These metrics actually represent
raw CPU usage in cores,
derived from the Kubernetes Kubelet’s
UsageNanoCores
field, which is an absolute measure of CPU usage (in units of nanocores).
The term utilization generally refers to a relative metric, typically
expressed as a ratio or percentage of used CPU against total CPU capacity or
limits. Using .cpu.utilization
for absolute usage values violates
Semantic Conventions,
potentially confusing users and tooling expecting utilization metrics to be
relative.
What Is Changing?
To address this semantic mismatch, we introduced new .cpu.usage
metrics that
correctly represent raw CPU usage values:
At the same time, the legacy .cpu.utilization
metrics have been
marked for deprecation.
Feature Gate: receiver.kubeletstats.enableCPUUsageMetrics
Note that the .cpu.utilization
metrics were enabled by default so far. To
manage the transition smoothly, a feature gate named
receiver.kubeletstats.enableCPUUsageMetrics
was introduced:
- Alpha
(v0.111.0):
The feature gate was introduced but disabled by default. Users needed to
explicitly enable it to receive
.cpu.usage
metrics instead of.cpu.utilization
by default. - Beta
(v0.125.0):
The feature gate
was promoted to beta
and enabled by default. In this state:
- The
.cpu.usage
metrics are emitted by default. - Attempts to enable.cpu.utilization
metrics will fail. - Users can explicitly disable the feature gate to temporarily restore the
deprecated
.cpu.utilization
metrics if needed.
- The
- Stable (Upcoming): The feature gate will remain in beta for several releases to allow the community ample time to adapt. The plan and discussion for moving it to stable is tracked in Issue #39650.
What Does This Mean for You?
Impact on Existing Users
- If you upgrade to v0.125.0 or later, the Collector will emit
.cpu.usage
metrics by default. - Any monitoring dashboards, alerting rules, or queries relying on
.cpu.utilization
metrics will break or not function as expected. - The deprecated
.cpu.utilization
metrics are planned for eventual removal, so updating is necessary for long-term compatibility.
Recommended Actions
- Audit your observability pipelines for references to
.cpu.utilization
metrics. - Update dashboards, alerts, and queries to use the new
.cpu.usage
metrics. - Test the new metrics by enabling the feature gate in staging (or rely on the default enabled state in v0.125.0+).
- Plan your migration timeline considering that
.cpu.utilization
will be removed in future releases. - Stay engaged with the OpenTelemetry community via GitHub issues and PR discussions.
Why Keep the Feature Gate in Beta?
The decision to keep the feature gate in beta for multiple releases is driven by:
- The critical nature of kubeletstats CPU metrics for many production observability pipelines.
- The need to allow users and vendors ample time to adapt and update their tooling.
- The opportunity to gather feedback and address any unexpected issues before the change becomes permanent.
This approach minimizes disruption and helps ensure a smooth transition for everyone.
Useful Links and References
- Issue #27885 - Semantic update for kubeletstats CPU metrics
- PR #35139 - Introduce
.cpu.usage
metrics and feature gate (alpha) - PR #39488 - Promote feature gate to beta and enable by default
- Issue #39650 - Plan to move feature gate to stable
- OpenTelemetry Metrics Semantic Conventions
- Kubernetes Kubelet Stats API
Final Thoughts
The transition from .cpu.utilization
to .cpu.usage
metrics in the
kubeletstats
receiver is an important step to ensure that Kubernetes metrics
conform to semantic best practices. We appreciate the community’s patience and
collaboration as we make these improvements.
If you have questions, want to share feedback, or need help migrating, please join us on the CNCF Slack.
Thank you for helping us build clearer, more reliable Kubernetes observability!