Skip to main content

Troubleshooting Flashduty Monitor Alerts When Data Looks Unexpected

How to troubleshoot Flashduty monitor alerts when the queried data does not match expectations, with a Prometheus-focused debugging script and timestamp-based query method.

Buffett Ba

Problem

A user received a Flashduty monitor alert, but the data did not look as expected and they wanted to investigate the specific cause. This article focuses on Prometheus data sources.

How It Works

Flashduty monitor periodically queries monitoring data from a time-series database such as Prometheus. It calls the /api/v1/query endpoint, which is also how the native Prometheus alert engine works. If you compare the result with a Grafana line chart, the data may look different. A common reason is the query lookback parameter in the time-series database.

You do not have to understand all implementation details. The practical approach is to mimic Flashduty monitor's query method and query the data at the alert time to see whether the result matches expectations.

Debugging Script

The following Python script was generated with AI and can be used for querying:

import requests
import argparse
import time
from datetime import datetime, timezone, timedelta

promql = """mem_available_percent"""

def cst_to_unix(cst_str):
    """Convert a UTC+8 time string to a Unix timestamp."""
    tz_cst = timezone(timedelta(hours=8))
    cst_str = cst_str.replace('T', ' ')
    dt = datetime.strptime(cst_str, '%Y-%m-%d %H:%M:%S')
    dt_cst = dt.replace(tzinfo=tz_cst)
    return int(dt_cst.timestamp())

def unix_to_cst_str(ts):
    """Convert a Unix timestamp back to a human-readable UTC+8 time string."""
    tz_cst = timezone(timedelta(hours=8))
    dt = datetime.fromtimestamp(ts, tz=tz_cst)
    return dt.strftime('%Y-%m-%d %H:%M:%S')

def main():
    parser = argparse.ArgumentParser(description="Prometheus Query Tool (SRE Debug Version)")
    parser.add_argument("--prometheus_url", required=True, help="Prometheus base URL")
    parser.add_argument("--start_time", required=True, help="Start time in CST (e.g., '2023-10-01 08:00:00')")
    parser.add_argument("--end_time", required=True, help="End time in CST (e.g., '2023-10-01 08:10:00')")
    # Fourth parameter: step size
    parser.add_argument("--step", type=int, default=15, help="Step in seconds (default: 15)")
    
    args = parser.parse_args()

    try:
        start_ts = cst_to_unix(args.start_time)
        end_ts = cst_to_unix(args.end_time)
    except ValueError as e:
        print(f"Time format error: {e}. Please use 'YYYY-MM-DD HH:MM:SS'")
        return

    api_endpoint = f"{args.prometheus_url.rstrip('/')}/api/v1/query"

    current_ts = start_ts
    print(f"\nQuerying: {args.start_time} to {args.end_time} with step {args.step}s\n")
    
    while current_ts <= end_ts:
        # Get a human-readable time string
        readable_time = unix_to_cst_str(current_ts)
        
        params = {
            "query": promql,
            "time": current_ts
        }
        
        try:
            response = requests.get(api_endpoint, params=params, timeout=5)
            response.raise_for_status()
            data = response.text
            
            # Print both the timestamp and the readable time
            print(f"------------ {current_ts} | {readable_time} -----------")
            print(data + "\n")
        except Exception as e:
            print(f"------------ {current_ts} | {readable_time} -----------\n")
            print(f"Error: {str(e)}\n")
        
        # Increment by the provided step parameter
        current_ts += args.step

    print(f"Done!")

if __name__ == "__main__":
    main()

Change the promql variable at the top to the PromQL expression you want to inspect. Save the script as debug.py, then run:

python3 debug.py \
  --prometheus_url http://10.99.1.107:9090 \
  --start_time "2025-12-19 21:38:11" \
  --end_time "2025-12-19 21:39:11" \
  --step 15
  • prometheus_url: Prometheus access URL.
  • start_time: query start time in CST.
  • end_time: query end time in CST.
  • step: query interval in seconds. The default is 15 seconds. Set it to the frequency configured in your alert rule.

For example, suppose an alert fired at 2025-12-19 21:39:10, the rule runs every 15 seconds, and it triggered 4 consecutive times. You can set the end time to 2025-12-19 21:39:11, start time to 2025-12-19 21:38:11, and step to 15 seconds. Then inspect exactly what data was queried.

The end time is intentionally set one second after the alert time to avoid boundary issues. In real troubleshooting, try several nearby timestamps and compare the results.

(py3venv) ulric@ulric-fcc01 misc % python3 debug.py \
  --prometheus_url http://10.99.1.107:9090 \
  --start_time "2025-12-19 21:38:11" \
  --end_time "2025-12-19 21:39:11" \
  --step 15

Querying: 2025-12-19 21:38:11 to 2025-12-19 21:39:11 with step 15s

------------ 1766151491 | 2025-12-19 21:38:11 -----------
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"mem_available_percent","ident":"cn-beijing.10.99.1.109"},"value":[1766151491,"21.608323685048916"]}]}}

------------ 1766151506 | 2025-12-19 21:38:26 -----------
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"mem_available_percent","ident":"cn-beijing.10.99.1.109"},"value":[1766151506,"22.473846026331397"]}]}}

------------ 1766151521 | 2025-12-19 21:38:41 -----------
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"mem_available_percent","ident":"cn-beijing.10.99.1.109"},"value":[1766151521,"21.02999645150345"]}]}}

------------ 1766151536 | 2025-12-19 21:38:56 -----------
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"mem_available_percent","ident":"cn-beijing.10.99.1.109"},"value":[1766151536,"21.231924036272368"]}]}}

------------ 1766151551 | 2025-12-19 21:39:11 -----------
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"mem_available_percent","ident":"cn-beijing.10.99.1.109"},"value":[1766151551,"22.203590456875407"]}]}}

Done!

Related articles