Queries and dashboards¶
How metrics flow from a pod's /metrics endpoint to a Grafana panel, and the
PromQL patterns you'll use 90% of the time when authoring dashboards.
This page complements Dashboards (which covers the ConfigMap-based provisioning mechanism) and Scrape targets (which covers how metrics get into Prometheus in the first place).
The metric pipeline¶
flowchart LR
A[App / exporter<br/>exposes /metrics] -->|HTTP scrape| B[Prometheus<br/>TSDB]
B -->|PromQL via<br/>HTTP API| C[Grafana<br/>datasource]
C -->|panel query<br/>+ variables| D[Panel rendering]
There are four hops between a metric line in a pod and a line on a Grafana panel:
- Source. An app or exporter exposes Prometheus-format metrics on an
HTTP endpoint, conventionally
/metrics. Each line ismetric_name{label="value",...} sample_value [timestamp]. - Scrape. Prometheus scrapes the endpoint on the interval defined in
the
ServiceMonitor/PodMonitor(interval: 30sby default). Each scrape produces one new sample per series, written to the TSDB. - Query. Grafana's Prometheus datasource sends PromQL over HTTP
(
/api/v1/query,/api/v1/query_range) and gets JSON back. - Render. A Grafana panel is one (or several) PromQL queries plus a visualization (time series, gauge, table, ...).
In this stack, hop 1 is configured by ServiceMonitor / PodMonitor
resources, hop 2 is automatic, hop 3 is wired by the chart (Prometheus is
the default Grafana datasource — in-cluster URL
http://prometheus-stack-kube-prom-prometheus.monitoring.svc:9090), and hop
4 is what you author.
How Prometheus thinks: time series with labels¶
Every metric is a family of time series identified by its name plus its label set. For example:
container_cpu_usage_seconds_total{namespace="default", pod="api-7c-abc", container="api", cpu="total"}
is one series. Change any label value and it's a different series.
PromQL is built around two ideas:
- Instant vector — one sample per series at one point in time
(e.g.
up). - Range vector — many samples per series over a time window
(e.g.
up[5m]). Range vectors are the input to functions likerate(),increase(),histogram_quantile().
You almost never plot raw counters — you rate() them first.
The four PromQL patterns you will use 90% of the time¶
1. Rate of a counter¶
Use for: requests per second, errors per second, bytes per second, anything
ending in _total.
sum by (status_code) (
rate(http_requests_total{job="my-app"}[5m])
)
Reads as: "for my-app, sum the per-second rate of http_requests_total
over a 5-minute window, grouped by status_code."
In Grafana, replace 5m with the built-in $__rate_interval so the window
scales with the panel's time range and the underlying scrape interval:
sum by (status_code) (rate(http_requests_total{job="my-app"}[$__rate_interval]))
Counters always need rate()
A raw counter just rises monotonically — the line on the chart
tells you nothing. rate(counter[window]) gives you "samples per second
over the window".
2. Ratio (error rate, success rate, saturation)¶
Use for: dimensionless 0–1 values you want to display as percent.
sum(rate(http_requests_total{job="my-app",code=~"5.."}[$__rate_interval]))
/
sum(rate(http_requests_total{job="my-app"}[$__rate_interval]))
Pattern: numerator / denominator, both expressed as rates over the same
window. Set the panel unit to "Percent (0.0–1.0)" in Grafana so the Y
axis reads as 5% rather than 0.05.
3. Latency percentiles from a histogram¶
Use for: p50/p95/p99 latency, where the source metric is a Prometheus
histogram (*_bucket{le="..."}).
histogram_quantile(
0.99,
sum by (le) (rate(http_request_duration_seconds_bucket{job="my-app"}[$__rate_interval]))
)
The sum by (le) is what makes it work — histogram_quantile requires
the le label to be the only dimension that varies. Add other groupings
inside the by (...) if you want one line per route or per pod:
histogram_quantile(
0.99,
sum by (le, route) (rate(http_request_duration_seconds_bucket{job="my-app"}[$__rate_interval]))
)
4. Resource usage from gauges¶
Use for: anything ending in _bytes, _seconds, or any metric that goes
both up and down (kube_pod_container_resource_limits, node_load1,
container_memory_working_set_bytes).
sum by (namespace) (
container_memory_working_set_bytes{namespace!="", container!=""}
)
For "% of limit" patterns:
sum by (pod) (container_memory_working_set_bytes{namespace="my-ns"})
/
sum by (pod) (kube_pod_container_resource_limits{namespace="my-ns",resource="memory"})
How the Grafana panel pieces fit¶
When you author a panel, four things matter:
| Piece | What it controls | Where to set it |
|---|---|---|
| Datasource | Which Prometheus to query | Top of the panel; usually leave at "default" because the chart wires this |
Query (expr) |
The PromQL | Query editor; one panel can have multiple queries (A, B, C ...) |
Legend (legendFormat) |
How each line is labelled | E.g. {{namespace}} / {{pod}} |
| Variables | Drop-downs at the top of the dashboard | Dashboard settings » Variables |
Variables / templating¶
A variable is just a query whose result becomes a drop-down. The pattern is:
label_values(kube_namespace_labels, namespace)
That returns every value of the namespace label across the
kube_namespace_labels series. In your panel queries you then reference
$namespace:
sum by (pod) (container_memory_working_set_bytes{namespace="$namespace"})
Common variable patterns the bundled dashboards use:
# Cluster (when you have multiple Prometheis with externalLabels.cluster)
label_values(up, cluster)
# Namespace, scoped to selected cluster
label_values(kube_namespace_labels{cluster="$cluster"}, namespace)
# Workload, scoped to namespace
label_values(
kube_pod_owner{cluster="$cluster", namespace="$namespace"},
owner_name
)
Use $__rate_interval (built-in) instead of hardcoding [5m]. Grafana
picks a window that matches the panel resolution and the scrape interval.
Where the metrics come from in this stack¶
| Metric prefix | Source | What it tells you |
|---|---|---|
up |
Prometheus itself | Is each scrape target healthy (1) or down (0) |
container_*, machine_* |
Kubelet / cAdvisor (via the kubelet ServiceMonitor) |
Per-container CPU, memory, FS, network |
kube_* |
kube-state-metrics |
Object state: kube_pod_status_phase, kube_deployment_spec_replicas, ... |
node_* |
node-exporter |
OS-level: node_cpu_seconds_total, node_filesystem_avail_bytes, node_load1 |
apiserver_* |
kube-apiserver /metrics |
Control-plane health and SLOs |
prometheus_*, alertmanager_* |
Self-monitoring | The stack monitoring itself |
your_app_* |
Whatever your ServiceMonitor points at |
Your application metrics |
If a series doesn't exist, the dashboard panel will simply be empty. Sanity-check by typing the metric name in Prometheus UI » Graph and seeing if anything comes back.
Read the bundled dashboards as your tutorial¶
The chart ships ~25 pre-built dashboards as ConfigMap templates under
deployment-charts/prometheus-stack/templates/grafana/dashboards-1.14/.
They are the best PromQL reference you'll find because they cover every
pattern above on real Kubernetes metrics:
| Dashboard | Good for learning |
|---|---|
apiserver.yaml |
SLO / availability / latency histograms |
k8s-resources-cluster.yaml |
Cluster-wide CPU/memory rollups |
k8s-resources-namespace.yaml |
Per-namespace breakdowns |
k8s-resources-pod.yaml |
Per-pod CPU/memory + limit ratios |
kubelet.yaml |
cAdvisor / kubelet metrics |
nodes.yaml |
node-exporter dashboards |
prometheus.yaml |
Self-monitoring (TSDB, scrape, WAL) |
Pull the queries from any installed dashboard:
kubectl -n monitoring get cm prometheus-stack-grafana-k8s-resources-pod \
-o jsonpath='{.data.k8s-resources-pod\.json}' \
| jq '..|.expr? // empty' \
| head -40
That returns the ~30 PromQL expressions used in that dashboard, all known-good against the chart's metrics.
Authoring loop in practice¶
- Find the metric. Open Prometheus UI » Graph — start
typing a name and let autocomplete help. Or list the full set:
kubectl -n monitoring port-forward svc/prometheus-stack-kube-prom-prometheus 9090 curl -s localhost:9090/api/v1/label/__name__/values | jq -r '.data[]' | grep -i myapp - Write the PromQL in Prometheus first. Iterate in the Graph tab until the query returns what you want. Easier than debugging in Grafana because Prometheus error messages are clearer.
- Paste into a Grafana panel. Replace hardcoded label values with
$variables and any hardcoded[5m]window with[$__rate_interval]. - Set the panel unit. Bytes, seconds, percent — Grafana renders nothing useful if the unit is wrong.
- Save the dashboard, export the JSON. Dashboard settings » JSON Model » copy.
- Wrap the JSON in a ConfigMap labelled
grafana_dashboard: "1". The sidecar auto-loads it. Full recipe in Dashboards.
Two pitfalls worth flagging¶
Counter resets. When a pod restarts, its counters reset to zero.
rate() handles this correctly because it's defined to detect resets. But
increase() over a window that contains a restart can undercount. Prefer
rate() * window_seconds if you need exact totals.
Cardinality. Don't group by (pod) on a metric scraped from 10,000
pods unless your panel is filtered by namespace or workload first. The
query will be slow and the legend unreadable. Always reduce to
~10–30 series per panel by aggregating up:
# Bad: one line per pod (potentially thousands)
container_memory_working_set_bytes{namespace="$namespace"}
# Good: top 10 by current usage
topk(10, sum by (pod) (container_memory_working_set_bytes{namespace="$namespace"}))
Quick reference: aggregation operators¶
| Operator | When to use |
|---|---|
sum |
Add up rates / values across pods, namespaces, etc. |
avg |
Average across a group (rare for rates — usually use sum) |
max, min |
Saturation alerts (highest CPU across pods) |
count |
"How many series match" — e.g. how many pods are running |
topk(N, expr) |
Restrict a noisy panel to the top N series |
bottomk(N, expr) |
Same, but the smallest N |
quantile(0.99, expr) |
Percentile across series (different from histogram_quantile!) |
The PromQL grouping clause is always sum by (label1, label2) (...) or
sum without (label) (...). by keeps the listed labels; without keeps
everything except the listed labels.
Related pages¶
- Dashboards — how to ship a dashboard via ConfigMap so the Grafana sidecar auto-loads it.
- Scrape targets — how to get your application metrics into Prometheus in the first place.
- Alerts and rules — the same PromQL patterns, but evaluated as alerting / recording rules.