Incident clustering
A cluster is a group of incidents whose text is semantically similar within a configurable time window. The agent never sees the cluster mechanism directly. The agent sees a banner on a ticket saying "this looks related to N others." This page explains how the AI decides which tickets belong together and how the cluster grows, ages, and resolves.
What gets compared
Every incoming incident's text - title and description - is compared against recent incidents using NLP embeddings. The match is semantic, not keyword based:
- "Outlook won't send mail" and "Email delivery failing for marketing team" match.
- "Cannot reach SharePoint" and "SharePoint site down for finance" match.
This is deliberate. Categories on incoming tickets are filled inconsistently by requestors, so a category-based match would miss most real patterns. Free text is the most reliable signal.
When a cluster forms
A cluster is created when N or more incidents with semantically similar text are observed within a time window. The defaults:
- N = 4 incidents
- window = 4 hours
Both are configurable per tenant. A cluster does not require all N incidents to be open at the same moment; what matters is the rolling count inside the window.
Living clusters
Clusters keep growing as new incidents arrive. When a new incident comes in, the agent checks every active cluster first. If the text matches an existing cluster, the incident joins that cluster. A new cluster is only created when nothing matches.
This is important for the agent reading the banner. "N other tickets in the last 3 hours" reflects the current cluster size at the moment the agent opens the ticket, not the size when the cluster was first detected.
For every cluster, the agent writes a plain-language summary describing what the incidents have in common, when the cluster started, and how fast it's growing.
Lifecycle
Each cluster moves through a small set of states:
| State | Definition (default) | What it means |
|---|---|---|
| Emerging | Average growth rate ≤ 1 incident per hour | A pattern is forming but it might still be a coincidence. Visible to Problem Managers; not yet broadcast to agents. |
| Active | Average growth rate > 1 incident per hour | The pattern is real-time. Inline banner appears on every member ticket; cluster shows on the Problem Manager dashboard. |
| HyperCare | Set after a fix is applied | A workaround or fix is in place; the agent is monitoring to see if new incidents stop arriving. Defaults to a 24-hour window. |
| Resolved | HyperCare window closes with no new incidents | The cluster moves to Resolved automatically. Past clusters remain searchable for historical pattern matching. |
| Stale | Set when an Emerging cluster stops growing | The pattern looked promising but didn't materialise. The cluster stays in the system in case the pattern resumes, but it is hidden from active surfaces. |
All four growth and timing thresholds are configurable. The detail is on the Configuration page.
Historical pattern matching
When a new cluster forms, the agent checks whether the current cluster looks similar to a cluster from the past that was linked to a Problem or Major Incident. If a match is found, the agent silently links the new cluster to the previous record and surfaces the prior resolution to the on-call team:
Similar problem occurred on Feb 12 - resolved as KE-0042 with workaround "Restart transport service."
This shortens time-to-resolution dramatically the second time a known pattern reappears.
What clustering does not do
Clustering does not read change tickets, monitoring logs, or CMDB health data in this release. Those are planned correlation sources for a later milestone. Today, the only signal that drives cluster formation is the text of the incident.
Related topics
- Alerts and reporting - what agents see when a cluster gets big enough to act on
- Problem / Major Incident / Change Management Dashboard - where Problem Managers monitor clusters in flight
- Configuration - editing the cluster-size and time-window thresholds