Interpret The Results of The MITRE Engenuity ATT&CK for ICS

Now that the 2021 MITRE Engenuity ATT&CK® Evaluations for Industrial Control Systems (ICS) results have been released, the question many in the industrial community have is how to interpret them. MITRE Engenuity is clear that they don’t declare a “winner” and do not assign overall scores, rankings, or ratings to the vendors or their cybersecurity technology. Instead, they’re very transparent and present the evaluation results based on four separate, but related, categories of visibility and detection so other organizations may provide their own analysis and interpretation. This is preferable over heavily massaged statistics derived from the dataset in an effort to present vendor products in a favorable light. We’ve seen some interesting claims being made relative to the ATT&CK Evals that are dubious, at best. Rather than focus on them, here’s our perspective on the evaluation results as presented by MITRE and how to interpret them.

The Evaluation Categories

Each of the MITRE ATT&CK matrices is comprised of Tactics and Techniques.

Tactics represent the “why” of an ATT&CK technique or sub-technique. It is the adversary’s tactical goal: the reason for performing an action. For example, an adversary may want to achieve credential access.
Techniques represent “how” an adversary achieves a tactical goal by performing an action. For example, an adversary may dump credentials to achieve credential access.

The MITRE Evaluation’s adversary actions were broken into 25 main steps and 100 sub-steps.

Steps in the MITRE Evaluation represent an adversary’s objective. Steps could be loosely associated with Tactics.
Sub-steps represent the specific actions taken to complete an objective. Sub-steps could be loosely associated with Techniques.

The MITRE Evaluation results that were published this week provide a rich data set in four different categories: Detection count, Analytic coverage, Telemetry coverage, and Visibility. Here’s what each one means:

Detection Count: This is the total number of detections from the evaluation. It measures the depth of detection and the multiple methods of measuring the same type of threat behavior. Depth adds resiliency to threat behavior-based detections. An adversary can change one or more aspects of their technique, but a detection will still fire. Depth of detection also provides higher resolution information around the adversary threat behavior, which can greatly support the process of weeding out false positives. An example of detection in-depth in action from the MITRE evaluation would be the use of SSH as a command and control (C2) system. The adversary actually modified the Windows Host log configuration on the hosts to disable all messages related to secure shell protocol (SSH). This blinded our host-based detections for the SSH C2 actions. However, we still had detections for the SSH over 445 (normally we would expect to see Windows Server Message Block (SMB) traffic on this port) and other SSH telemetry and could track the actions of the Adversary even though they successfully evaded our host detection methods.

The downside of depth of detection is that if not properly managed it may contribute to operator detection fatigue by providing too many notifications. Within the Dragos Platform, we address this by linking notifications that are related to each other in the Notification Viewer. Also, we have aggregated overlapping detections into Composite Analytics. Composite Analytics are multi-step threat detections that more confidently relate to adversary actions. They often will take data from multiple sources, such as Windows Events, Network Traffic, Asset Information, or Vulnerability Information to create ICS context-sensitive Notifications. Our goal is to aggregate data from multiple sources into high-confidence notifications which provide rich evidence proving the existence of ICS threat behavior. The process of detection aggregation is never-ending, which is why we provide new detections with monthly Knowledge Pack software updates to Dragos Platform customers.

Analytic Coverage: This is the proportion of sub-steps that contained a detection that provides additional context (e.g., General, Tactic, Technique). It measures the ability of the product to convert telemetry into actionable threat detections. Analytic coverage provides the greatest value to a network defender as Analytics provide threat context and are easily actionable. Whereas in the case of telemetry you have to try and find the threat behavior needle in the haystack manually, Analytics will automatically find the needle for you.

Telemetry Coverage: This is the proportion of sub-steps that produced a detection with minimal processing. Telemetry is the foundational data that detections process their logic against to determine if they should trigger an alert. As an ICS network defender, it is often valuable to be able to look at the telemetry that triggered a particular detection or telemetry prior to or after an event. Not all vendors allow you to view the underlying telemetry that triggers a detection. The Dragos Platform allows you to search and sort telemetry natively. Telemetry coverage means that the event was collected. Telemetry could even be in the form of a log text file that could be grepped (i.e. searched) to show the data is present. However, this is not a practical approach for a network defender during an incident where time is of the essence. Telemetry is raw data without any threat context.

Visibility: This is the proportion of sub-steps with either an analytic or a telemetry detection. It represents the vendor’s ability to see each sub-step taken by the adversary at some level. To better understand the portion of the visibility that is actionable by a network defender, we must look at the ratio of Analytic Coverage to Telemetry Coverage. Telemetry coverage just means that the data was collected somewhere, somehow, is not necessarily actionable, nor is it easily identifiable as threat behavior for an analyst. Analytic coverage takes the raw telemetry and creates an actionable notification that includes details of what occurred and ideally provides guidance/a playbook with the steps that the operator should take next to validate and triage. A higher percentage of Analytic coverage to raw Telemetry Coverage will provide more value to a network defender in the form of threat behavior context.

Figure 1: Dragos Platform results in MITRE ATT&CK Evals

Interpreting the Results

In Figure 1 above, you’ll see how the Dragos Platform performed in the MITRE ATT&CK Evals. We’re very proud of these results, and believe they demonstrate the Dragos Platform’s high level of effectiveness in finding threats and efficiency in responding to them. The Platform results were the highest overall for Detection Count (156 of 100) and Analytic Coverage (63 of 100), and second highest in Telemetry and Visibility coverage at 93 of 100 versus the highest result of 96 of 100. The results for Detection and Analytics are particularly gratifying and are evidence of the Platform’s effectiveness in finding threats. We see this as validation of our intelligence-driven strategy. The results for Telemetry and Visibility are very good (93 vs 96), and are especially important when combined with the Detection and Analytic coverage. Telemetry and Visibility without the context provided by Detections and Analytics leaves a lot of work and analysis up to the network defender. Visibility combined with the context from Analytics and Detections is powerful and enables analysts to make faster, better decisions.

Now that we understand the high-level results that MITRE provided, let’s take a look at specifically how these related to the performance of the Dragos Platform.

Detection Depth

The Detection Count of 156 out of 100 demonstrates the Dragos Platform detection depth. Detection depth is when there are multiple methods of looking for the same types of threat behaviors. The Dragos intelligence-driven detection strategy enables our detection engineers with a rich dataset of adversary threat behavior. This threat behavior data set allows our detection engineers to identify multiple methods of detecting the same threat behavior.

For example, based on our XENOTIME threat intelligence we know that compiled Python executables masquerading as legitimate windows binaries were used during the TRISIS (TRITON) incident in 2017. To identify this type of threat behavior using the Dragos Platform, our detection engineers created the following detections:

Using host event log data to detect the use of compiled Python executables.
Using host event log data to detect when ICS protocol network connections are being established from executables outside of the expected vendor program folder path.
Using network traffic to detect the use of specific Python ICS protocol libraries (network signatures)
Using network traffic to detect when Compiled Python executables are transferred over the network (in an unencrypted form).

During the evaluation, the Adversary was able to defeat two out of the four types of detections we had for Python executables. By transferring files over encrypted channels (SSH), we were unable to see the compiled Python executables being moved around the network. By executing the compiled Python in a temp folder that also contained a subfolder called “Rockwell” they were able to bypass our detection for ICS protocol network connections are being established from executables outside of the expected vendor program folder path. Fortunately, during the evaluation, we still had two out of the four Python compiled executable detections, and both of those fired when the adversary used these compiled Python executables to manipulate the PLCs.

When a detection in-depth strategy is used, a single adversary threat behavior can trigger multiple detections to fire simultaneously. Triggering multiple detections can contribute to alert fatigue and the Dragos detection engineers mitigate this by combining detections into Compound Analytics and aggregate, or link, related detections.

Detection Breadth

The Analytic Coverage of 63 out of 100 was the highest result of all participants and demonstrates the breadth of detections within the Dragos Platform. The Analytic Coverage tells us that the Dragos Platform provided an actionable detection for 63 of the 100 sub-steps taken by the Adversary. The ultimate goal of any participant in a MITRE evaluation would be to trigger a detection for each of the 100 sub-steps and we are working to create additional detections for the sub-steps that were missed. A great example of detection breadth in the Dragos Platform during the MITRE evaluation is our use of Windows Event log data.

During the evaluation, the adversary deployed an encrypted SSH-based C2 channel. The Dragos Platform has lots of network-based detections for C2 channels like Cobalt Strike, Metasploit, and Metasploit-based exploits (although version 6 of Metasploit does offer end-to-end encryption now also). Once end-to-end encryption is applied to the C2 channel, very little can be done to understand what specific actions are being taken. Fortunately, the Dragos Platform also can collect and monitor Windows Events, which allowed us to closely monitor the actions of the adversary. The adversary even disabled the SSH logging events in the Windows configuration, but we still were able to collect enough telemetry to trigger detections and call out threat behaviors at each adversary step.

Quality of Detections

Within the 63 sub-steps we detected, there are three categories of detection:

General Detections: A detection was triggered
Tactic Detections: A detection was triggered and mapped to the specific ATT&CK for ICS tactic utilized by the adversary. Tactics are broad categories of threat behavior (example: Lateral Movement). There are only 12 Tactics in total within the ATT&CK for ICS matrix.
Technique Detections: A detection was triggered and was mapped to the specific ATT&CK for ICS technique utilized by the adversary. Techniques are specific threat behavior actions (example: Indicator Removal on Host). There are 79 Techniques in total within the ATT&CK for ICS matrix.

Of the three different categories of detection, Technique Detections represent the most granular type. Moreover, to map a detection to a specific technique, an understanding of the Adversary’s Threat Behavior, as well as the Asset context, is required. For example, in the Dragos Platform, to identify the “Loss of Safety” Technique or “Safety System Compromise” technique, we have to accurately identify the Safety System assets on the network and look for signs of compromise or process impacts related to them.

Looking Ahead

MITRE ATT&CK for ICS is a powerful taxonomy to better understand and prioritize multi-stage attacks. The evaluation itself is an enabler as the industry matures and gains an understanding of the discrete steps of an attack that can lead to a truly disastrous outcome. The MITRE and Engenuity team should be proud of their ability to demonstrate a real-world attack that emulated a dangerous activity group in an entirely different environment.

The ATT&CK Evals represent a complete data set for an end-to-end attack on an ICS system. One of the challenges we face in ICS cybersecurity is the lack of detection and collection capability within most ICS environments. We often struggle to piece together the complete attack chain in actual ICS incidents because the environments are not capable of collecting the required evidence. With the ATT&CK for ICS evaluation scenario, we have each adversary step and the associated threat detections within the Dragos Platform. While we’re convinced the Dragos Platform did exceptionally well, the real winners of such evaluations are the community members and customers who see vendors step up to be tested and get independent, objective insights into how they performed. The scenario provided by MITRE can be studied by the vendors who participated to greatly enhance the threat detection capabilities of their product.

We caution the ICS community to take claims made in the post-evaluation marketing blitz with a grain of salt. And, we encourage the ICS community to focus on the data provided directly from the MITRE / Engenuity team. The official 2021 MITRE ATT&CK for ICS Evaluation results can be found on their website.