Updated: Dec 27, 2022
Creating incident classifications and building escalation policies around them is extremely important for playbook creation, triage, enrichment, analysis, and incident management.
When you build and eventually enable your technology stack and the first security events and alerts come into your SIEM from various sources, you may quickly be drowning in so many of them that you will not know what to look at. In this post, we will learn how to properly set up incident classifications and associated escalation policies.
Doing so will guide analysts to better know what is most important and what can wait to be investigated. Both for manual investigation and for automation, these parameters help routing events into the correct workflows to ensure that the correct investigation steps are triggered every time – without creating incident handling playbooks for each single alert.
When creating incident classifications, you will want to consult with your senior Security Operations Center (SOC) analysts. Based on these conversations you can decide which properties need to be present in each new security event. Additionally, you can use various other properties in your event such as action taken and directionality while pairing it with your classification to adjust severity levels.
Finally, by filling your asset information into a robust configuration management database such as Security Orchestration and Automated Response (SOAR) indicator database or a 3rd party tool you can automatically adjust severity levels.
It is time to take control over the alerts coming into your SOC and making the most of the data ingested and stored. This is doable by creating incident classifications and appending an escalation policy to each of them.
In this post we are going to cover the following topics:
· Incident classification definition and creation
Incident classification definition
Incident classifications are important for SOC governance, policies and workflows. They ensure a pragmatic approach with appropriate levels of urgency. When creating your incident classification be sure to consult with your senior analysts to accurately define an analytical workflow and significant incident attributes for enrichment.
The key components for building an incident classification are the following:
· Significant incident attributes
· Data source uniformity
· Use-case list for aggregation
· Incident name that tells a story
The purpose of an incident classification is to link alerts coming into the SIEM together based on certain properties that they share. In the example below, you can see that several alerts are linked together by the “Access Anomaly” incident classification.
If you did not have a classification, you would need to think of a dedicated workflow to handle each single alert. In an automated environment this also entails making changes to each workflow every time you need to adjust steps. It easily becomes clear that classifications can be very helpful to streamline these efforts whilst guaranteeing consistent results.
Significant incident attributes
Whenever you are defining a new classification, it is required to define your significant incident attributes. Let us look at access anomaly for an example. This classification is used for security events such as impossible travel activity, unfamiliar sign-in properties, and sign-ins from an infrequent country. Whenever we want to investigate this incident type, we require at least two attributes to be present for all security events: IP Address and User Account. We obtained this information when communicating with our senior analysts. Since they have been working on the front lines for a while, they are the best source of information for us.
Now we know that each of these alerts should be enriched to include the IP address and user account because according to our analysts this is the information necessary to investigate the user’s typical login patterns. Now we need to explore how to use these attributes in conjunction with a common data source for furthering our investigation.
Data source uniformity
Ensuring data source uniformity for your classifications is not always doable. Should you intent to do so, you need to pay great attention to consistent and correct field mapping and define the fields across your data sources that will be used as your significant incident attributes. It becomes far easier when the investigative data sources are the same. For example, when looking at the alerts we associated with the access anomalies classification we know that our common data source is the Azure Active Directory audit logs.
When querying this data set using your significant incident attributes it is far easier to gather the information required for investigation. For example, if you want to know which device was used for the sign-in that caused the security event or to gather a list of the common countries that the user typically signs in from, you will be able to obtain that information. This is only doable by defining your significant incident attributes and data source which is required for investigation with your playbook. Once we have defined our significant incident attributes and common data source, we should begin creating a list of use-cases or alerts that fit the context of the incident type based on name, attributes, and data source.
Finding use-cases and alerts once defining our significant incident attributes and data source should become a bit easier. We know that our incident type needs an IP address and a user account as attributes and its common data source for investigation needs to be Azure Active Directory audit logs.
You will find a lot of matching alerts as default alerts in a cloud access security broker solution, but you may find some more matches in use-cases which you created yourself. Some examples are the following:
· Impossible travel activity
· Unfamiliar sign-in properties
· Infrequent country
· Azure AD risky sign-in
When you have identified this list you can now automate significant parts of your incident response activities. Once you have an idea of which use-cases and alerts you’d want to aggregate, the next step is creating a name for your incident type that tells a story.
Incident name that tells a story
Finding a matching name for your incident classification is extremely important. It will help to clearly visualize at first glance what general type of incident an analyst is confronted with. This step in the process is a good time to review the use-case list you just created and decide on whether you may want to split them up further to allow for a more fine-grained classification. The key is to define classifications that are as broad as possible but as specific as needed. You may want to review your decision frequently – at least at the beginning – to identify if some use-cases may need to be shifted into another classification to or whether another classification should be added. Do not be afraid to play around with this initially, and you will find that you can achieve a great classification portfolio very quickly.
For example, assume that you identified a list of use-cases that trigger based on malware-related alerts sent by antimalware tools. You could initially define a classification and name it malware, which would not be very specific but make it clear what all alerts in this classification are generally about. With some more time and investigation, especially while designing the response playbook associated with this classification, you may find this classification too broad. Eventually you decide to split it up into two classifications. In the event that a malicious executable is identified which has been uploaded to SharePoint or downloaded from an exploit kit you use malware pre-compromise. In addition, you use malware post-compromise for any security event that involves successful installation or execution of malicious software on a device.
This will help you to better determine the level of urgency to take on an incident. For example, ransomware being installed on a device has different handling procedures and escalation policies than a drive-by compromise attempt or a suspicious file being detected. Making sure your name tells a story gives you greater flexibility when defining your escalation policies.
Before we get into escalation policies let us go through some example incident types.
Examples of incident classifications
Malware post-compromise can be defined as any malicious installation on an endpoint. This is the next stage after a malware pre-compromise which is classified as a pre-compromise event if the payload hasn't been triggered.
· Endpoint detection and response
Malware pre-compromise is any attempt at downloading or uploading a malicious executable with the intent of installation, however, no installation was identified. Often seen as the delivery portion of the kill-chain.
· Endpoint detection and response
Service exploitation is determined by any attack, inbound, outbound, or internal that is attempting to remotely exploit a service.
· Endpoint detection and response
· SQL injection
· Remote file inclusion
Malicious endpoint command execution
Malicious endpoint command execution is primarily detected through EDR and Sysmon. In this case it would be a security event that was alerted upon via a command line parameter.
· Endpoint detection and response
· Running minidump
· Enumerating users
A phishing attempt is when a threat actor sends a malicious attachment or link in an attempt to either compromise the user’s credentials or workstation.
· User-Reported (Phishing emails which are submitted by a user)
· Office 365
· Email Gateway Solution
· Suspicious attachment
· Suspicious link
A vulnerability scan is two or more distinct Service Exploitation attempts against one or more target(s).
· Intrusion Detection System/Intrusion Prevention System
· Network Detection and Response
Denial of Service or Distributed Denial of Service
A DoS or DDoS attack can be detected when there is a large volume increase of a particular type of traffic. Some exploits can also cause a service disruption, but this should be classified as Service Exploitation.
· UDP Flood
· Ping of Death
· SYN Flood
Brute force login success
Brute Force Login Success is whenever there is a brute force (many login attempts) followed up by a login success.
· Windows Security Event Logs
· Azure Audit Logs
· Most Other Audit Logs
· Suspected Brute Force Login Success
· Credential Stuffing
· Dictionary Attack
A policy violation is a custom use-case which alerts on violations of company resources.
· Use of pornography
A weak configuration is a detection of a potentially hazardous configuration, such as a user logging in with no MFA.
· User logs in without MFA
Detects whenever a user is conducting abnormal behavior on cloud resources.
Severity: MediumData sources:
· Download 1000 files simultaneously
· Any other abuse of cloud resources
Detections are whenever a device is conducting abnormal behavior. For example, many RDP connections from a single device. A rapid increase of DNS queries from a single endpoint and other various deviations from the norm.
· Multiple RDP connections
· Sudden increase in DNS requests
Whenever there is a rare process occurrence or operation. For example, a strange cross process or parent process which interacts with lsass.exe. Additionally, it can be used to detect rare processes across the organization.
· Endpoint Detection and Response
· Interacting with many files rapidly
· Never seen process
This incident classification is based on a user signing in with abnormal attributes. This can be an impossible travel activity sourcing from a Cloud Access Security Broker Solution or from an Identity Detection solution.
· Azure Active Directory
· Infrequent countries
· Unfamiliar sign-in properties
Potentially unwanted application
An application that was installed on an endpoint that was classified as adware/riskware.
· Endpoint Detection and Response
A single device which sends traffic to one or more destinations over a static destination port with a dynamic mutation.
· External post scan
· Port scanning tool detected
Now that we have covered a variety of different incident classifications, we can continue to discuss how to put them into action with an escalation matrix. By combining your incident classifications with relevant severity levels and multipliers you can prioritize the most urgent security events.
An escalation matrix will help your organization to determine how to handle severity levels based on incident type, action taken, directionality, and affected asset. A proper matrix will help you prioritize more urgent threats and eliminate a great deal of work.
The three most important components of your matrix will be directionality, action taken, and severity.
You can flexibly alter your severity based on directionality and action taken.
Directionality is essentially the direction of the attack which we define with the following characteristics: Inbound, Internal, and Outbound. Essentially, we want to see where the attack is originating from and where it's destined to. Incorporating directionality can easily help your organization prioritize events. For example, an SQL Injection attack originating from an external source is typically viewed at a lesser urgency than an SQLi originating from an internal device.
The second component we should take into consideration is our action taken. Action taken can have a few different components such as blocked or unblocked. A good example is when you see a Malware Post-Compromise incident occur which is then subsequently quarantined by your antivirus tool.
You can set a different severity level based on the fact that the malware was quarantined. Additionally, you can factor in both directionality and action taken. For a service exploitation event occurring inbound and blocked may be set to low. Whereas a service exploitation inbound and unblocked can be considered medium.
Before we start building our escalation matrix, we should start taking a look at how severity levels define what actions to take.
Handling procedures help us define which actions to take. For example, most organizations do not directly handle low severity alerts, instead they create a report which they review every 7 days. For medium severity and higher they prefer to take a swifter action. For example, a medium severity can be handled by an email escalation only which requires action be taken within 12-24 hours of notification.
For high severity security events it’s recommended to receive a phone call, email, and immediate action be taken. Lastly, for critical severity alerts, which you achieve through severity modifiers, it’s recommended to immediately notify your Incident response team and prepare for an engagement.
As these procedures are based on the severity of an incident, it is recommended to define all severity levels to be used and associate them with a numerical value, e.g. as follows:
Next, we will build a simplified represention the aforementioned handling procedures:
Phone Escalation, Email, Incident Response Engagement
Phone Escalation, Email
Report Only (Daily or Weekly)
After defining the handling procedures, the next step is to obtain a list of critical assets and VIPs in the organization. Whenever those assets are affected, we increase our severity level as +1. For example, we define Low as 1 but when a critical asset is affected we increase it to 2 which is mapped to Medium severity. It is recommended to be a bit on the conservative side when it comes to declaring critical severity. This should only come via a high-fidelity signature or a severity modifier from High to Critical.
As mentioned before, we need to be able to shift our scoring mechanisms to meet the appropriate level of urgency for our security events. However, taking this simplistic approach makes it easier for your SOC to understand how to handle incoming security events:
Creating a matrix for escalation is relatively simple once you have created your classifications and defined your handling procedures. It’s much easier now to develop playbooks and speak with your team about the level of urgency for incidents. For example, do we want a phone call at night for a Potentially Unwanted Application on a workstation? Probably not. However, you may reconsider if it's a critical asset. Additionally, the entire incident changes if PUA is Malware Post-Compromise or if you consider it Malware Pre-Compromise. You’d want to adjust your levels of urgency accordingly so that your SOC can react appropriately and get the right people involved as expeditiously as possible.
Now that we understand the properties which define our escalation matrix, we can start building one. Designing a simplified matrix and describing each one of your classifications gives your analyst the ability to easily cross-reference it and take the right actions. In addition, they do not waste their time on low severity events directly, instead they can take a look on a scheduled basis at the low severity security events.
Next, we will take a look at an example matrix for defining our actions taken per the classifications we have built:
Once you have created your matrix you want to move on to building your Incident Response Life-Cycle and workflow. This will get your incident classifications, your escalation matrix and SOC analysts working in the right direction. From there we can start analyzing key-metrics on SOC performance so we can break down our weak points and figure out how to strengthen them.
In this post you learned how to build Incident Classifications and Escalations Matrices. Now that you have learned this vital information you can begin designing the overall workflow of your SOC which we call the target operating model. Once you have defined this we can start working towards key metrics and technical implementation. This post contains important material for you to design your incident response life-cycle.