Skip to main content

Flashduty Alert Rules: How Multiple PromQL Queries Work

Flashduty is not only a one-stop alert On-call platform; it also provides alert engine capabilities. This article explains how multiple PromQL queries work inside Flashduty alert rules.

Product Team @ Flashcat

Flashduty is not only a one-stop alert On-call platform. It also provides an alert engine that can connect to many monitoring systems. This article explains how multiple PromQL queries work in Flashduty alert rules.

Scenario 1: Joint Calculation Across Multiple Metrics

Suppose you want to alert when disk IO.UTIL is greater than 99% and write latency is greater than 10 ms. You can configure it like this:

Of course, you could write this directly in one PromQL expression, such as irate(diskio_io_time[2m])/10 > 99 and irate(diskio_write_time[2m])/irate(diskio_writes[2m]) > 10. I intentionally split it into two query conditions here to demonstrate the feature.

This configuration performs a joint calculation across multiple metrics, but there is one prerequisite: the query results must have exactly the same labels. Otherwise, they cannot be calculated together. Matching labels indicate that the participating data points describe the same monitored object. For example, CPU metrics and disk metrics usually have different labels, so they cannot be calculated this way.

Scenario 2: Parallel Alerts Across Multiple Metrics

If the metrics are unrelated, do not use threshold calculation this way. Instead, split them into separate alert rules, or switch from threshold calculation to Data exists mode. For example:

In the image above, one query condition is mem_available_percent, and the other is net_bits_recv. Their labels differ, so they cannot be jointly calculated. They can be evaluated in parallel in Data exists mode.

In practice, we do not recommend this as the default approach. Separate alert rules are usually better. But sometimes multiple query conditions are similar and belong to the same category, and splitting them into many alert rules makes management cumbersome. In that case, this mode can be useful. However, alerts generated in this mode do not include the metric name by default. The reason is complicated, so we will not expand on it here. A workaround is to use the label_replace function to add a label whose value matches __name__, for example:

label_replace(mem_available_percent{ident="dev-backup-01"}, "metric_name", "$1", "__name__", "(.*)") > 0

The generated alert event will then include a metric_name label whose value is the metric name.

Related articles