🤖 DISCLAIMER: Found by my Claude bot, I hope it may serve as useful. Uncertainty made me create an issue instead of a PR.
Description
The orleans-runtime-total-physical-memory and orleans-runtime-available-memory metrics in EnvironmentStatisticsProvider are created using CreateObservableCounter, but they represent point-in-time gauge values, not monotonically increasing cumulative totals.
This causes the metrics to report incorrect values when consumed by exporters that use delta temporality (e.g., Azure Monitor / Application Insights, Datadog). Specifically:
orleans-runtime-total-physical-memory effectively always reports 0 because the underlying value (GC.GetGCMemoryInfo().TotalAvailableMemoryBytes) is near-constant for the lifetime of a container — the delta between successive observations of a near-constant value is 0.
orleans-runtime-available-memory reports small positive/negative deltas instead of the actual available memory value.
With cumulative temporality exporters (e.g., Prometheus), the raw absolute values are visible at scrape time, masking the bug — but applying rate() or increase() to these metrics would still produce nonsensical results, since they are not cumulative quantities.
Evidence
Production telemetry from Orleans 10.0.1 on AKS with Azure Monitor (delta temporality):
orleans-runtime-total-physical-memory: every sample shows value: 0
orleans-runtime-available-memory: samples show values like +19, -10, 0, +1, +11, -2
The negative values (-10, -2) are the definitive evidence: available memory cannot be negative as an absolute measurement. These are deltas produced by the exporter subtracting successive observations of a fluctuating value.
Source
In EnvironmentStatisticsProvider.cs:
_availableMemoryCounter = Instruments.Meter.CreateObservableCounter(
InstrumentNames.RUNTIME_MEMORY_AVAILABLE_MEMORY_MB,
() => (long)(_availableMemoryBytes / OneKiloByte / OneKiloByte),
unit: "MB");
_maximumAvailableMemoryCounter = Instruments.Meter.CreateObservableCounter(
InstrumentNames.RUNTIME_MEMORY_TOTAL_PHYSICAL_MEMORY_MB,
() => (long)(_maximumAvailableMemoryBytes / OneKiloByte / OneKiloByte),
unit: "MB");
Expected Behavior
These should be CreateObservableGauge since they represent the current state of memory, not a running total:
- Total physical memory is a near-static value (container memory limit / GC available memory)
- Available memory is a fluctuating point-in-time measurement
Per the OpenTelemetry specification, an Asynchronous Gauge should be used for "non-additive" values where "it does not make sense to report the sum from multiple entities." Summing available memory across silos is meaningless — each silo's available memory is an independent measurement.
This is also consistent with other gauge-like Orleans metrics (e.g., orleans-catalog-activations, orleans-directory-partition-size) that correctly use ObservableGauge.
Impact
- Operators cannot monitor container memory limits or actual available memory via Orleans metrics with delta-temporality backends (Azure Monitor, Datadog)
- Dashboards and alerts based on these metrics are broken
- Orleans' internal behavior (grain collection, activation shedding via
ActivationCollector) is NOT affected — it reads from IEnvironmentStatisticsProvider.GetEnvironmentStatistics() directly, not from the exported metrics
Secondary: metric name is misleading in containers
The metric orleans-runtime-total-physical-memory sources its value from GC.GetGCMemoryInfo().TotalAvailableMemoryBytes, which is documented as "memory available to the GC" — in containers, this is the cgroup memory limit, not host physical RAM. The name total-physical-memory is a legacy artifact from bare-metal deployments that can be misleading in containerized environments.
Environment
- Orleans 10.0.1
- .NET 10.0
- Azure Monitor OpenTelemetry Exporter (delta temporality)
- Running in Kubernetes (AKS) with Linux Alpine containers
🤖 DISCLAIMER: Found by my Claude bot, I hope it may serve as useful. Uncertainty made me create an issue instead of a PR.
Description
The
orleans-runtime-total-physical-memoryandorleans-runtime-available-memorymetrics inEnvironmentStatisticsProviderare created usingCreateObservableCounter, but they represent point-in-time gauge values, not monotonically increasing cumulative totals.This causes the metrics to report incorrect values when consumed by exporters that use delta temporality (e.g., Azure Monitor / Application Insights, Datadog). Specifically:
orleans-runtime-total-physical-memoryeffectively always reports 0 because the underlying value (GC.GetGCMemoryInfo().TotalAvailableMemoryBytes) is near-constant for the lifetime of a container — the delta between successive observations of a near-constant value is 0.orleans-runtime-available-memoryreports small positive/negative deltas instead of the actual available memory value.With cumulative temporality exporters (e.g., Prometheus), the raw absolute values are visible at scrape time, masking the bug — but applying
rate()orincrease()to these metrics would still produce nonsensical results, since they are not cumulative quantities.Evidence
Production telemetry from Orleans 10.0.1 on AKS with Azure Monitor (delta temporality):
orleans-runtime-total-physical-memory: every sample showsvalue: 0orleans-runtime-available-memory: samples show values like+19, -10, 0, +1, +11, -2The negative values (
-10,-2) are the definitive evidence: available memory cannot be negative as an absolute measurement. These are deltas produced by the exporter subtracting successive observations of a fluctuating value.Source
In
EnvironmentStatisticsProvider.cs:Expected Behavior
These should be
CreateObservableGaugesince they represent the current state of memory, not a running total:Per the OpenTelemetry specification, an Asynchronous Gauge should be used for "non-additive" values where "it does not make sense to report the sum from multiple entities." Summing available memory across silos is meaningless — each silo's available memory is an independent measurement.
This is also consistent with other gauge-like Orleans metrics (e.g.,
orleans-catalog-activations,orleans-directory-partition-size) that correctly useObservableGauge.Impact
ActivationCollector) is NOT affected — it reads fromIEnvironmentStatisticsProvider.GetEnvironmentStatistics()directly, not from the exported metricsSecondary: metric name is misleading in containers
The metric
orleans-runtime-total-physical-memorysources its value fromGC.GetGCMemoryInfo().TotalAvailableMemoryBytes, which is documented as "memory available to the GC" — in containers, this is the cgroup memory limit, not host physical RAM. The nametotal-physical-memoryis a legacy artifact from bare-metal deployments that can be misleading in containerized environments.Environment