Flashduty: A One-Stop Alert OnCall Platform for Faster Enterprise Response

How Many of These Alert Pain Points Have You Hit?

Companies often have many monitoring systems: cloud and on-premises, open source and in-house, commercial and self-built, device-level, component-level, and business-level. Some products even include their own fragmented monitoring. The result is that alert events are scattered everywhere and cannot be handled, analyzed, or collaborated on uniformly.
Monitoring systems focus on data collection, visualization, and alert generation, but pay less attention to what happens after an alert is generated. Capabilities such as alert grouping and escalation may be missing. Even when some monitoring systems provide these features, they are closed-loop capabilities that cannot be shared with other monitoring systems. Some limited monitoring systems only support email notifications and cannot be managed in a unified OnCall system.
Mobile work is difficult. Viewing alert details may require connecting to a VPN. It would be much better to acknowledge, mute, transfer, and classify alerts directly in IM tools such as Feishu, DingTalk, and WeCom.
Event handling analytics are missing, such as event volume, phone and SMS cost, MTTA, MTTR, and related statistics.
Flexible scheduling is missing. To implement SRE practices, teams first need schedules. People who are not on duty should be able to focus on longer-term work, and the whole team should not be woken up by alert calls at night.

After more than a decade in operations, and after open sourcing Open-Falcon and Nightingale, I know these problems well. I wanted a truly usable product that could solve them thoroughly, so I started a company and built Flashduty.

Flashduty one-stop alert OnCall platform

Flashduty Overview

Monitoring systems are designed differently, and most do not focus on downstream event handling. Flashduty is built to connect with different monitoring systems, bring events into one platform, and handle them uniformly through noise reduction, label enrichment, scheduling, acknowledgment and escalation, analytics, and more.

Flashduty supports dozens of monitoring data sources

Flashduty's core logic can be summarized by the following diagram:

Flashduty's key capabilities include:

Alert integration. The goal is to handle all alerts on one OnCall platform. Most common monitoring tools support webhooks, so an OnCall platform can adapt to different monitoring tools by providing the corresponding webhook endpoint with minimal configuration cost for users. Some less open monitoring tools may provide only email notifications. If the OnCall platform can receive those emails and parse their content, email becomes a useful fallback integration method.
Label enrichment. Richer alert labels make alerts easier for engineers to process. In reality, many monitoring tools send alerts with only a few bare fields, such as hostname, metric, and threshold. If external metadata such as CMDB can be connected and alert fields can be expanded, those enriched fields can be used for more automated dispatch and for helping engineers quickly judge impact and severity during incident handling.
Grouping and noise reduction. Grouping similar alerts and converging frequent alerts can significantly reduce alert volume and unnecessary interruptions. Rule-based and semantic-similarity-based grouping are both feasible. Alert grouping can work across monitoring data sources. For example, alerts from Zabbix and Prometheus can be grouped if they are similar.
Alert suppression. Higher-severity alerts can suppress lower-severity alerts, or lower-layer infrastructure alerts can suppress upper-layer module alerts. In short, suppression introduces a form of dependency relationship. These dependencies can be costly to maintain and difficult to explain, so heavy use at large scale is not recommended.
Schedules. Scheduling avoids frequent interruptions to the entire team. Daily duty, holiday duty, temporary swaps, and fair rotation all need to be considered. Clear notifications are also needed during handoff. Duty roles should support primary and backup responders.
Acknowledgment. In theory, every alert should be acknowledged. If an alert is sent, nobody acknowledges it, and nothing bad happens, then the alert is meaningless and should not be sent. MTTA is commonly used to measure acknowledgment efficiency and effectiveness.
Escalation and transfer. Establishing clear escalation paths for different alert levels reduces psychological pressure on OnCall engineers and helps problems be solved quickly and accurately. Escalation can be manual or automatic. For example, if an alert has not been handled for more than 30 minutes and has not recovered, it can automatically escalate to a manager or backup responder so the issue is eventually handled.
Collaboration. During alert handling, relevant people can be pulled in at any time. Often, once the right people are gathered, the problem is already halfway solved. Automatically creating a war room would be even better. When collaborators are added, they need to be notified accurately and promptly, and the handling process and timeline should be retained clearly so collaborators can understand the full picture quickly.
Notifications. Outside China, Slack connects a large ecosystem and many collaboration workflows happen inside Slack. It is not an exaggeration to call it an operating system for collaboration. In China, WeCom, Feishu, and DingTalk dominate. These IM tools support app development, and receiving, acknowledging, closing, transferring, and handling alerts inside these embedded apps is key to improving the OnCall experience. The mobile work experience is a major improvement once you use it.
Analytics and operations. Alert compression ratio, MTTA, MTTR, acknowledgment ratio, and alert volume are key metrics for measuring OnCall efficiency. Analyzing these metrics by business, team, individual, and other dimensions helps drive alert optimization and governance so OnCall becomes more efficient.

Grouped handling for different alert types

Flexible schedules for precise alert delivery

Free Trial

Flashduty is a SaaS product. Register and try it for free at the following address. After logging in, you can find demo videos and documentation in the upper-right corner:

https://console.flashcat.cloud/

Pricing

We charge by active user. Users who only receive alerts but do not handle them are not counted as active. The current price is 179 RMB per active user per month. We also provide a free plan with more feature limitations, suitable for individual users.

Can It Be Privately Deployed?

Yes, but we do not recommend it in most cases. We recommend using the SaaS edition so we can maintain and upgrade it uniformly at lower cost. A private deployment is relatively expensive and is suitable for companies that cannot use SaaS due to policy requirements.

What If the Startup Fails?

We have not been building this company for very long, but we have already been recognized by and partnered with many well-known enterprises. We can support ourselves, so you can use the product with confidence.