Incident Classifications and Escalation Policies

Alexander Sinno
Nov 10, 2022
11 min read

Updated: Dec 27, 2022

Creating incident classifications and building escalation policies around them is extremely important for playbook creation, triage, enrichment, analysis, and incident management.

When you build and eventually enable your technology stack and the first security events and alerts come into your SIEM from various sources, you may quickly be drowning in so many of them that you will not know what to look at. In this post, we will learn how to properly set up incident classifications and associated escalation policies.

Doing so will guide analysts to better know what is most important and what can wait to be investigated. Both for manual investigation and for automation, these parameters help routing events into the correct workflows to ensure that the correct investigation steps are triggered every time – without creating incident handling playbooks for each single alert.

When creating incident classifications, you will want to consult with your senior Security Operations Center (SOC) analysts. Based on these conversations you can decide which properties need to be present in each new security event. Additionally, you can use various other properties in your event such as action taken and directionality while pairing it with your classification to adjust severity levels.

Finally, by filling your asset information into a robust configuration management database such as Security Orchestration and Automated Response (SOAR) indicator database or a 3rd party tool you can automatically adjust severity levels.

It is time to take control over the alerts coming into your SOC and making the most of the data ingested and stored. This is doable by creating incident classifications and appending an escalation policy to each of them.

In this post we are going to cover the following topics:

· Incident classification definition and creation

· Examples of incident classifications

· Escalation matrix

· Escalation policies

Technical requirements:

Incident classification definition

Incident classifications are important for SOC governance, policies and workflows. They ensure a pragmatic approach with appropriate levels of urgency. When creating your incident classification be sure to consult with your senior analysts to accurately define an analytical workflow and significant incident attributes for enrichment.

The key components for building an incident classification are the following:

· Significant incident attributes

· Data source uniformity

· Use-case list for aggregation

· Incident name that tells a story

The purpose of an incident classification is to link alerts coming into the SIEM together based on certain properties that they share. In the example below, you can see that several alerts are linked together by the “Access Anomaly” incident classification.

If you did not have a classification, you would need to think of a dedicated workflow to handle each single alert. In an automated environment this also entails making changes to each workflow every time you need to adjust steps. It easily becomes clear that classifications can be very helpful to streamline these efforts whilst guaranteeing consistent results.

Significant incident attributes

Whenever you are defining a new classification, it is required to define your significant incident attributes. Let us look at access anomaly for an example. This classification is used for security events such as impossible travel activity, unfamiliar sign-in properties, and sign-ins from an infrequent country. Whenever we want to investigate this incident type, we require at least two attributes to be present for all security events: IP Address and User Account. We obtained this information when communicating with our senior analysts. Since they have been working on the front lines for a while, they are the best source of information for us.

Now we know that each of these alerts should be enriched to include the IP address and user account because according to our analysts this is the information necessary to investigate the user’s typical login patterns. Now we need to explore how to use these attributes in conjunction with a common data source for furthering our investigation.

Data source uniformity

Ensuring data source uniformity for your classifications is not always doable. Should you intent to do so, you need to pay great attention to consistent and correct field mapping and define the fields across your data sources that will be used as your significant incident attributes. It becomes far easier when the investigative data sources are the same. For example, when looking at the alerts we associated with the access anomalies classification we know that our common data source is the Azure Active Directory audit logs.

When querying this data set using your significant incident attributes it is far easier to gather the information required for investigation. For example, if you want to know which device was used for the sign-in that caused the security event or to gather a list of the common countries that the user typically signs in from, you will be able to obtain that information. This is only doable by defining your significant incident attributes and data source which is required for investigation with your playbook. Once we have defined our significant incident attributes and common data source, we should begin creating a list of use-cases or alerts that fit the context of the incident type based on name, attributes, and data source.

Finding use-cases and alerts once defining our significant incident attributes and data source should become a bit easier. We know that our incident type needs an IP address and a user account as attributes and its common data source for investigation needs to be Azure Active Directory audit logs.

You will find a lot of matching alerts as default alerts in a cloud access security broker solution, but you may find some more matches in use-cases which you created yourself. Some examples are the following:

· Impossible travel activity

· Unfamiliar sign-in properties

· Infrequent country

· Azure AD risky sign-in

When you have identified this list you can now automate significant parts of your incident response activities. Once you have an idea of which use-cases and alerts you’d want to aggregate, the next step is creating a name for your incident type that tells a story.

Incident name that tells a story

Examples:

· Suspicious attachment

· Suspicious link

Vulnerability scan

A vulnerability scan is two or more distinct Service Exploitation attempts against one or more target(s).

Severity: Medium

Data sources:

· Intrusion Detection System/Intrusion Prevention System

· Network Detection and Response

· Firewall

Denial of Service or Distributed Denial of Service

A DoS or DDoS attack can be detected when there is a large volume increase of a particular type of traffic. Some exploits can also cause a service disruption, but this should be classified as Service Exploitation.

Severity: Medium

Data sources:

· NDR

· FirewallExamples:

· UDP Flood

· Ping of Death

· SYN Flood

Brute force login success

Brute Force Login Success is whenever there is a brute force (many login attempts) followed up by a login success.

Severity: Medium

Data sources:

· Windows Security Event Logs

· Azure Audit Logs

· Most Other Audit Logs

Examples:

· Suspected Brute Force Login Success

· Credential Stuffing

· Dictionary Attack

Policy violation

A policy violation is a custom use-case which alerts on violations of company resources.

Severity: Medium

Examples:

· Use of pornography

· Keygen

· Torrents

Weak configuration

A weak configuration is a detection of a potentially hazardous configuration, such as a user logging in with no MFA.

Severity: Medium

Examples:

· User logs in without MFA

Cloud-based anomaly

Detects whenever a user is conducting abnormal behavior on cloud resources.

Severity: MediumData sources:

· Azure

· AWS

· Dropbox

Examples:

· Download 1000 files simultaneously

· Any other abuse of cloud resources

Network-based anomaly

Detections are whenever a device is conducting abnormal behavior. For example, many RDP connections from a single device. A rapid increase of DNS queries from a single endpoint and other various deviations from the norm.

Severity: Medium

Data Sources:

· Firewall

· DNS

· Proxy

· IDS

Examples:

· Multiple RDP connections

· Sudden increase in DNS requests

Endpoint-based anomaly

Whenever there is a rare process occurrence or operation. For example, a strange cross process or parent process which interacts with lsass.exe. Additionally, it can be used to detect rare processes across the organization.

Severity: Medium

Data sources:

· Endpoint Detection and Response

· Sysmon

Examples:

· Interacting with many files rapidly

· Never seen process

An escalation matrix will help your organization to determine how to handle severity levels based on incident type, action taken, directionality, and affected asset. A proper matrix will help you prioritize more urgent threats and eliminate a great deal of work.

The three most important components of your matrix will be directionality, action taken, and severity.

You can flexibly alter your severity based on directionality and action taken.

Directionality is essentially the direction of the attack which we define with the following characteristics: Inbound, Internal, and Outbound. Essentially, we want to see where the attack is originating from and where it's destined to. Incorporating directionality can easily help your organization prioritize events. For example, an SQL Injection attack originating from an external source is typically viewed at a lesser urgency than an SQLi originating from an internal device.

The second component we should take into consideration is our action taken. Action taken can have a few different components such as blocked or unblocked. A good example is when you see a Malware Post-Compromise incident occur which is then subsequently quarantined by your antivirus tool.

You can set a different severity level based on the fact that the malware was quarantined. Additionally, you can factor in both directionality and action taken. For a service exploitation event occurring inbound and blocked may be set to low. Whereas a service exploitation inbound and unblocked can be considered medium.

Before we start building our escalation matrix, we should start taking a look at how severity levels define what actions to take.

Handling procedures

Handling procedures help us define which actions to take. For example, most organizations do not directly handle low severity alerts, instead they create a report which they review every 7 days. For medium severity and higher they prefer to take a swifter action. For example, a medium severity can be handled by an email escalation only which requires action be taken within 12-24 hours of notification.

For high severity security events it’s recommended to receive a phone call, email, and immediate action be taken. Lastly, for critical severity alerts, which you achieve through severity modifiers, it’s recommended to immediately notify your Incident response team and prepare for an engagement.

As these procedures are based on the severity of an incident, it is recommended to define all severity levels to be used and associate them with a numerical value, e.g. as follows:

Severity	Numerical Expression
Critical	4
High	3
Medium	2
Low	1

Next, we will build a simplified represention the aforementioned handling procedures:

Severity	Action
Critical	Phone Escalation, Email, Incident Response Engagement
High	Phone Escalation, Email
Medium	Email
Low	Report Only (Daily or Weekly)

After defining the handling procedures, the next step is to obtain a list of critical assets and VIPs in the organization. Whenever those assets are affected, we increase our severity level as +1. For example, we define Low as 1 but when a critical asset is affected we increase it to 2 which is mapped to Medium severity. It is recommended to be a bit on the conservative side when it comes to declaring critical severity. This should only come via a high-fidelity signature or a severity modifier from High to Critical.

Severity modifiers

As mentioned before, we need to be able to shift our scoring mechanisms to meet the appropriate level of urgency for our security events. However, taking this simplistic approach makes it easier for your SOC to understand how to handle incoming security events:

Change	Old Severity	New Severity
+1	High	Critical
+1	Medium	High
+1	Low	Medium

Creating a matrix for escalation is relatively simple once you have created your classifications and defined your handling procedures. It’s much easier now to develop playbooks and speak with your team about the level of urgency for incidents. For example, do we want a phone call at night for a Potentially Unwanted Application on a workstation? Probably not. However, you may reconsider if it's a critical asset. Additionally, the entire incident changes if PUA is Malware Post-Compromise or if you consider it Malware Pre-Compromise. You’d want to adjust your levels of urgency accordingly so that your SOC can react appropriately and get the right people involved as expeditiously as possible.

Example Policies

Now that we understand the properties which define our escalation matrix, we can start building one. Designing a simplified matrix and describing each one of your classifications gives your analyst the ability to easily cross-reference it and take the right actions. In addition, they do not waste their time on low severity events directly, instead they can take a look on a scheduled basis at the low severity security events.

Next, we will take a look at an example matrix for defining our actions taken per the classifications we have built:

Once you have created your matrix you want to move on to building your Incident Response Life-Cycle and workflow. This will get your incident classifications, your escalation matrix and SOC analysts working in the right direction. From there we can start analyzing key-metrics on SOC performance so we can break down our weak points and figure out how to strengthen them.

Summary

In this post you learned how to build Incident Classifications and Escalations Matrices. Now that you have learned this vital information you can begin designing the overall workflow of your SOC which we call the target operating model. Once you have defined this we can start working towards key metrics and technical implementation. This post contains important material for you to design your incident response life-cycle.

Blog on SOC Design and Automation

Incident Classifications and Escalation Policies

Incident classification definition

Significant incident attributes

Data source uniformity

Incident name that tells a story

Examples of incident classifications

Malware post-compromise

Malware pre-compromise

Service exploitation

Malicious endpoint command execution

Phishing

Vulnerability scan

Denial of Service or Distributed Denial of Service

Brute force login success

Policy violation

Weak configuration

Network-based anomaly

Endpoint-based anomaly

Access anomaly

Potentially unwanted application

Port scan

Escalation matrix

Handling procedures

Severity modifiers

Example Policies

Summary

Recent Posts

Comentários