The False Choice of IT Vs. OT

The debate of “IT versus OT” appears often within the ICS security space, positing that a fundamental, natural tension exists between IT security goals and OT/ICS operational requirements, with such division extending to personnel and tools. Yet as a result of rapidly increasing IT-OT convergence, IT-centric problems and attack paths are rapidly becoming OT-oriented problems and challenges. Given this changing – but converging – environment, a necessary question arises: are OT-specific capabilities necessary to defend operational networks given the proliferation of IT-centric threats to ICS environments? This question essentially served as the point of debate at S4x18, with Dragos’ Ben Miller representing the OT specific position. Yet while such debates are interesting and superficially provocative, they establish a false choice between mechanisms when the real challenge from IT-OT convergence is how to best blend IT-based methods with OT-focused concerns and awareness to produce actionable, relevant security suitable to industrial needs.

Dragos has written extensively on the increasing trend in ICS-focused attacks to leverage “living off the land” and related IT-centric capabilities to enable ICS-focused intrusions. This trend was observed in high-profile events as well, such as in actions leading up to the CRASHOVERRIDE attack. Yet as shown in the case of CRASHOVERRIDE, the attack itself blended IT-based tools with ICS-specific knowledge in an effort to execute a complex, multi-staged attack focused on likely physical damage to electric transmission equipment. Essentially, CRASHOVERRIDE leveraged significant IT-centric techniques to enable a Stage 2, ICS-focused intrusion (as defined by the ICS Cyber Kill Chain) leading up to an ICS-specific manipulation to create an unsafe, unprotected situation. While it remains true that such attacks can potentially be identified (and stopped) in early, IT-centric stages corresponding to Stage 1 of the ICS Cyber Kill Chain, matters become more complex once intrusions have succeeded in gaining access to production environments.

A whole-of-killchain approach demands integrating ICS-specific visibility and understanding with broader, more general security frameworks to ensure detection across all phases of attacker operations. While it may be possible to detect and mitigate attacks at earlier stages of the intrusion lifecycle through IT-exclusive means, failure to do so while not incorporating ICS-specific detection and response capabilities leaves organizations at the mercy of attacker operations, skillsets, and objectives when they fail. Furthermore, even if defenders can identify an attack scenario based solely on IT-centric information, they may miss important, operation-specific details (and implications) that can result in worrying implications for operational protection and safety. Along these lines, Dragos has identified a clear trend in attackers gaining greater ICS domain knowledge and capability across multiple ICS intrusions, from Ukraine 2015 to CRASHOVERRIDE to TRISIS. Such operations have culminated in the use of industrial-specific capabilities and toolsets that are either irrelevant for or invisible to traditional IT-centric detection and response methodologies. Furthermore, the necessary responses to such attacks (manipulation of safety equipment or protection gear) require ICS-specific knowledge on actions and capabilities to ensure appropriate responses to circumstances.

The above is illustrated through the emergence of ICS-specific malware. Once an attacker is in place to deploy such tools, they have already successfully evaded multiple layers of IT-centric detection and monitoring. Absent OT-specific knowledge, visibility, and security monitoring, such tools can migrate into a compromised environment undetected and launch with impunity. At this stage, asset owners are left with a compromised, untrusted environment to then perform root cause analysis to diagnose what may have happened. As indicated in the TRISIS event, this can sometimes result in failed opportunities to identify attackers leading to multiple disruptive events in the same facility by the same adversary. The inability to observe and correlate industrial-specific knowledge and observations to potential IT anomalies or suspicious events handicaps plant personnel in performing adequate defense and remediation. An IT-focused approach relying on tools tuned to enterprise Windows environments is not only inadequate to diagnosing or investigating such events, such an approach completely lacks awareness of significance and impact to potential attack scenarios.

Further along these lines, ICS environments feature a greater diversity of systems and functions across and within Purdue Model layers that is not found in IT environments. Within a typical enterprise Windows domain, a Windows host is not significantly different in attack surface than any other device, with (hopefully) some variation between internal and externally-exposed devices and workstation-server differentiation. Within control system networks, devices feature significant differences in functionality (and attacker usefulness) that must be incorporated into defensive monitoring and response to adequately understand what might be occurring within a potentially compromised network, and how to recover a known-good, known-safe state. For example, in CRASHOVERRIDE a critical aspect of the attack path was leveraging compromised servers functioning as data historians as central nodes to access control systems for propagating the attack – knowing the function of devices and their role in ICS operations would allow an observer to properly categorize the subsequent communication (reverse of the expected client-to-server direction) as worth investigating. Similarly, appreciating the significance of the safety workstation in the TRISIS event and subsequent activity tied to it can enable personnel to better differentiate activity around the device versus expectations from an HMI or engineering workstation. Such nuance and awareness does not exist in IT-based solutions, and eliminates critical differentiation for ICS defensive operations. Simply saying that one could detect some IT-centric method with a traditional IT tool misses the point that how that method is used, where it arises, and what devices are impacted matter significantly to how personnel should disposition the resulting alert. Lacking ICS-specific knowledge, the fundamental nuance and significance behind an event (and where it occurred) is lost, leading to not only suboptimal, but substandard defense.

Once within control system networks, attackers have learned that leveraging fundamental system functionality for lateral movement and process execution (e.g., “living off the land”) represents the most direct mechanism for facilitating an ICS-centric intrusion. Such techniques were used in intrusions ranging from CRASHOVERRIDE to TRISIS to ALLANITE intrusions in US (and other) electric utilities. In observing these items, the immediate reaction of IT-centric security specialists is to simply apply IT best-practices to OT environments as a mechanism for detecting, blocking, and remediating such activity. On its face, this seems sensible, but in reality, this perspective is not only unworkable, but displays a concerning ignorance about the nature and flexibility of production networks.

While ICS environments increasingly incorporate equipment mirroring enterprise IT deployments – from near-ubiquitous use of Microsoft Windows on multiple products to many industrial network appliances basically being Cisco IOS devices with harsh-environment hardening – the manner in which these products are deployed and operational realities once engaged mitigate against IT-specific solutions for addressing problems in operational networks. For example, one of the major developments in IT security over the past five (or so) years is an increased emphasis on host-based monitoring and remediation, complete with multiple endpoint detection and response (EDR) products and solutions. While powerful and effective, such products demand certain actions on the part of protected hosts. Most significantly, protected devices must install and maintain an installed agent to perform monitoring and response, while also reporting back to a centralized monitor (either locally or in the cloud). While this is typically at worst an inconvenience for IT workstations and servers, in ICS applications such items are likely either in violation of vendor warranties for systems specifically configured and deployed to perform a dedicated function, or sufficiently resource-intensive on limited hardware as to cause operational problems.

Meanwhile, on the network side, IT environments increasingly adopt approaches shifting from intrusion detection systems (IDS) to intrusion prevention systems (IPS). While suitable for IT environments given quality of service and operational requirements, such an approach can court disaster within a control system environment. A traffic anomaly may not represent malicious activity so much as an atypical (but desirable) action on the part of a control system – such as a safety system – to prevent or mitigate a physical hazard. Introducing applications that can automatically disposition and filter traffic introduces operational risk and potential disruption that is entirely inappropriate to industrial environments. Overall, while IT-based techniques may address some security issues in industrial environments, in this case the cure may be as bad as (or worse than) the disease, or at least more direct and immediate in impact.

The above represents cases where IT-based techniques are observable and applicable to industrial environments, but IT-centric defenses are wholly inappropriate or unactionable. Industrial operations overall, and security in particular, is filled with instances of IT and engineering expertise clashing due to an unwillingness or inability of both sides to understand or appreciate each other. Current debate, given adversary tradecraft, emphasizes an IT-centric worldview attempting to force itself upon ICS operations when such items may be either irrelevant or outright harmful if applied. So while we can look at events like CRASHOVERRIDE or TRISIS as emphasizing a combination of fundamental penetration tester tradecraft using built-in system tools combined with common techniques such as credential capture by dumping LSASS memory (as built in to Mimikatz), actually observing or detecting such ability in industrial environments requires a rethink of observations and data sources to be effective. The fundamental nature of modern ICS operations as IT plus physics means that IT simply serves as a mechanism for communication and control – but fundamental value resides in the underlying physical process. Responding to or focusing on IT-centric items therefore represents so much emphasis on symptoms of a disease (or intrusion) as opposed to understanding and resolving root causes and fundamental sources. Therefore, only an approach that combines IT-centric visibility with ICS-relevant analysis and awareness can adequately understand the nature and significance of cyber intrusions in control systems environments.

Often lost in discussion but arguably more important, work does not stop at the point of detection but extends into response and remediation. IT-centric responses – from automatically quarantining (or deleting) “suspicious” files to wiping and replacing infected hosts – are dangerous or unactionable in a production environment. Remediation, outside of emergency circumstances, may need to wait for regular maintenance or shutdown periods instead of following typical IT practice of near-immediate resolution. Even more significantly, event and artifact analysis must incorporate ICS-specific understanding to adequately analyze an attack. Understanding the link between direct electric transmission disruption with subsequent loss of view and protective relay denial-of-service is vital information to ensure physical restoration does not inadvertently create an unsafe, dangerous state at physical restoration, as demonstrated in the CRASHOVERRIDE event. Similarly, just identifying that malware such as TRISIS exists is insufficient if operators cannot also evaluate and confirm that plant safety instrumented system (SIS) logic and fundamental integrity are in known good, safe states. Absent ICS-informed analysis and response actions, even identification of malicious IT-centric activity in control system environments will not enable proper, safety-conscious operational restoration.

So, instead of looking at a debate of “IT vs. OT” in terms of tools and techniques, the actual, useful conversation mirrors the convergence of IT and OT technologies: how can we meaningfully deploy detections and defenses against IT-based techniques in OT environments while being mindful of the limitations and requirements of these networks. This question, while harder to answer, actually appreciates the problems and challenges of industrial network defense and requires a degree of creative thinking. Rather than simply importing IT-centric defenses wholesale into industrial environments, prospective ICS defenders must learn to identify what techniques and behaviors are relevant for ICS-targeting adversaries, how to appropriately incorporate (where possible) IT-centric responses into operational environments, and then determine how to leverage existing visibility and what flexibility exists to build relevant detections and defenses.

When viewed in the above matter, security problems no longer appear as questions of “why don’t the engineers simply do X” based on IT-centric experience, but instead represent the nuanced issues facing plant owners and operators as they balance security with production requirements. Towards this end, IT-based solutions can serve as useful guides and baselines, but must be combined with ICS-specific knowledge and awareness to produce anything resembling an actionable, relevant security plan. In this environment, defenders must look to sources of data and defensive opportunities where they already exist and build on these advantages. Examples include taking advantage of physical network division and segmentation to establish visibility and defenses between enclaves where they exist; leveraging the typical IT-OT network divide to enforce robust remote authentication mechanisms while creating hardened jump hosts for initial ICS network access; cordoning off external parties to dedicated enclaves or “vendor networks” for remote support and access; and correlating available IT-centric data with process-specific information on historians to gain insight into how IT events may impact plant operations. All of these incorporate IT security methodologies to some extent, but do so in a fashion that takes ICS-specific requirements and environment knowledge to produce useful, implementable defensive plans.

Overall, the Dragos approach to ICS security reflects the above: to understand how ICS networks are evolving to blend IT-centric knowledge and defensive capabilities with an ICS-aware and -relevant implementation to produce full-spectrum industrial network defense. In this sense, the point of contention shifts from “IT vs. OT” to “how best to combine IT and OT perspectives”. In this reimagination of the artificial debate, IT-centric ideas inform – but do not dominate or replace – engineering-specific expertise to produce relevant, actionable, sustainable defensive strategies. By adopting this approach, asset owners and operators can effectively address modern ICS threats within the capabilities and limitations afforded by the current state of industrial networks. Responding to so much aggressive marketing on wholesale importing IT security solutions into ICS environments not only fails to solve fundamental problems, but introduces increased risk and instability on top of existing security concerns.