Problem / Major Incident / Change Management Dashboard
A single dashboard surface that brings together the AI-detected work items for three personas: the Problem Manager, the Major Incident Manager, and the Change Manager. Each persona sees the sections relevant to them.
This page focuses on the Problem and Major Incident views (Active Clusters, Problem Candidates, Recent Actions). The Change Manager view is the Standard Change Candidates section of Change Risk Advisory, which shares this same dashboard surface.
Active Clusters
Currently-active clusters in real time. Each row shows:
- Size (member ticket count)
- Linked tickets (clickable list of member IDs)
- Growth rate (incidents per hour over the last window)
- Age (time since the cluster moved into Active)
- Status (Active or HyperCare)
Default sort: impact then recency. A cluster of 40 tickets that started two hours ago appears above a cluster of 12 tickets that started six hours ago. The intent is to put the most urgent thing in front of the manager first.
Click any row to open the cluster detail (the same view the inline banner opens for agents). From there, the Problem Manager can declare an MI, open a Problem record, or dismiss the cluster as a coincidence.
Problem Candidates
Slower-burning patterns that surface from the periodic batch analysis. Each row shows:
- Pattern summary - AI-generated description
- Frequency - how often this pattern recurs (e.g. 3 times in 30 days)
- Impact - estimated affected users, incident count
- First seen / Last seen
Default sort: impact then cluster size. A pattern that hit 200 users twice in a month appears above a pattern that hit 10 users five times in a month.
These rows are not real-time. They are for the Problem Manager's weekly review of trends, not for incident-level urgency.
Recent Actions
A log of everything that happened to clusters recently:
- Clusters that were reported (which were declared as MI, which as Problem)
- Clusters that were dismissed (and who dismissed them)
- Clusters that were resolved (auto-moved out of HyperCare)
Filters: time range, severity, service, status.
The Recent Actions section is the audit log for the engine. When a Problem Manager wants to know whether a particular agent's dismissal pattern is healthy, or whether a tenant is over-reporting MIs, this is where they look.
Precision metric
At the top of the dashboard, a single number: "X% of active clusters in the last 30 days led to a published Problem or MI record." This is the precision metric driven by the feedback loop. High precision means the engine is mostly surfacing real patterns; low precision means it is surfacing noise. The Problem Manager uses this as a signal for whether to tighten or relax the similarity threshold.
Periodic batch analysis
Real-time clustering catches fast-moving incidents. Slower patterns - "47 password-reset tickets for the same app this month" - need a different lens. A periodic batch job runs on a configurable schedule (default daily, maximum once per hour) and looks across the wider history for:
- Recurring clusters - the same cluster pattern appearing 3 times in 30 days
- Chronic incident types - large counts of routine tickets for one service that suggest a structural problem
The batch's output feeds the Problem Candidates section. It does not feed Active Clusters, and it does not trigger inline banners. This is by design: batch findings are for Problem Manager review, not for real-time agent alerting.
Past issues
The dashboard also exposes resolved clusters and closed Problems for historical lookup. Useful when the Problem Manager wants to compare a new pattern against something they remember from six months ago.
Related topics
- Incident clustering
- Alerts and reporting
- Configuration - thresholds for active vs candidate surfaces