5 Costly Mistakes in Cyber Incident Response Preparation
A cybersecurity incident is a terrible crisis for any organization. Even with the best preparation and retainers, incident response is rarely an inexpensive endeavor in terms of money, people, operational disruption, or time. Investigations and forensics require specific expertise, and typically involve concerted eradication and recovery efforts. However, careful advance planning can substantially decrease these costs.
As a Principal Incident Responder at Dragos, I have consulted on numerous cases where significant time and resources could have been saved. I’ve discovered that by avoiding the five common “gotchas” below, your organization can avoid some of the most common traps which increase the time, personnel, downtime, and expense of managing a cybersecurity incident.
1. Lack of Environmental Understanding
The bottom line is that incident responders will always need a clear understanding of your impacted network topology, asset and security tool configuration, network addressing, and security policies in order to conduct a complete investigation. If you cannot provide current and complete architectural information about your environment and how it is secured and accessed, incident responders will have to gather that information themselves, at the cost of consulting rates and significant time. This information provides important clues about intrusion, adversary activities, and what evidence they will need to gather. Without this documentation, incident responders may fail to analyze the correct devices or correctly scope the investigation.
Dragos Professional Services use the Dragos Platform in our response investigations to gain initial visibility into the environment and identify threats. Our investigations are much easier if that, or a similar technology, is already deployed.
2. Lack of Incident Response Plan
Having an Incident Response Plan (IRP) truly matters. The IRP should not be approached lightly; it should describe in detail what your organization will do in case of a cybersecurity incident in a way that is useful in a crisis. At a minimum, this document should include the thresholds at which a situation is declared an incident, who will be contacted and involved in the incident response effort, how they will be contacted, how evidence will be preserved and collected, who will perform forensics and analysis, procedures to restore service, and any legal or regulatory requirements. IT and operational technology (OT) cybersecurity needs and incident response procedures are quite different, so discrete sections or documents for both IT and OT environments should exist.
Organizations need to ensure that:
- They know their most important plant or manufacturing processes and can identify the systems supporting these. At Dragos, we call this a Crown Jewel Analysis (CJA).
- They know what data can be collected of these important systems to support incident analysis, where the data is stored, and for how long that data is stored. This is essentially building a Collection Management Framework (CMF) [see our Collection Management Framework (CMF) whitepaper].
- They prepare ahead of time to have the tools and procedures in place at industrial sites to perform forensic data collection of host systems and the most important network segments. In contrast to IT environments, forensic data collection must be performed locally due to lack of network bandwidth, access restrictions, as well as regulatory and operational requirements that prevent deploying similar endpoint security tools as in IT environments. Thus, preparing and exercising how to collect forensic data in ICS environments is one of the most important tasks for ICS incident responders.
Failing to document incident response procedures can lead to a major increase in time and resource costs, as well as potential critical errors in evidence preservation and analysis. This is true even if incident response is contracted out to a third party. While an external consultant can advise you on the best courses of action and perform forensic analysis, they do not have access to your internal communications plans or risk matrices. It’s perfectly fine to delegate tasks to third party consultants within the context of the IRP document, but a rudimentary document should exist and be drilled on a routine basis.
OT Cybersecurity Incident Response Planning
Incident response plans around OT have some clear distinctions from more traditional IR Plans. OT response efforts are almost always distributed. The local facilities team supporting and running the facility is critical to the success of safe response and recovery. Typically, the cybersecurity team should serve an advisory function with the focus of “doing no harm.” Also, the OT IRP should be tightly woven in with the facility’s emergency operations plans (EOP). The EOPs are already well known and practiced by the local teams, and if there is an impact to safety the EOP will always override the OT IRP. Coordinating and executing between the two of these plans is critical to preserving data when possible while avoiding conflict or confusion.
3. Lack of Security Logging
There are several types of digital forensics evidence with which Incident Response specialists paint a picture of malicious activity on a network. These include disk and memory analysis of computers, network traffic analysis, and security log analysis. None of these sources of evidence can provide a holistic picture of an intrusion entirely on their own. If no standard security logging has been generated or retained in a compromised environment, a large swathe of key historical evidence has been lost and can never be replaced.
As outlined above, we highly recommend that every IT and OT security team take the time to draft a Collection Management Framework (CMF) to identify what sources of logging and digital evidence are available in their environments, how long they are retained, and how they can be retrieved. The CMF drastically reduces investigation time as incident responders seek out useful evidence. It also aids in identifying monitoring gaps and improving routine security monitoring coverage.
There are many useful log sources available in modern computing environments, but some extremely useful logs for modern incident response that should be centrally stored and retained wherever available include:
- Windows Event Logs
- Active Directory Authentication
- PowerShell Logging
- EDR / Application Whitelisting (Where Available)
- Firewall Logging
- VPN Authentication
- Configuration Change Management
- DNS Query and Response Logs
- Web Proxy Logs
An unfortunate reality is that OT environments often fall short of both diverse and central logging. Dragos recommends focusing on chokepoints and perimeter log collection, in addition to deploying sensors within the OT networks to gain east/west network traffic visibility. Keep in mind that many environments have third-party network connections or ad-hoc contractor access where the definition of “perimeters” can be fluid and range wildly between two facilities.
Additionally, OT is more than just IT systems. For instance, unique logs exist within a Distributed Control System (DCS) or Supervisory Control and Data Acquisition (SCADA) environments ranging from artifacts/logs on controllers, device sequence of events (SOE), and application-specific logs. Secondly, having an OT-capable network sensor that identifies commands being issued via OT protocols (often proprietary) is extremely important when wanting to determine why or how a particular event occurred.
4. No Incident Response Retainer
Deciding whether or not to purchase an Incident Response service retainer for either IT or OT is a risk and financial decision which every organization must make for itself. Of course, a retainer is a recurring expense that must be budgeted for whether it is used or not. There may be situations in which it is simply not feasible. However, the downsides to not establishing an Incident Response retainer are a substantially higher hourly rate for response services when required, and a lack of a clear and established Service Level Agreement with a firm of choice.
During an incident, an organization without a retainer may have to contact multiple security firms to find one with availability, as most credible firms are very busy and prioritize retainer customers. Onboarding and preparation will have to be conducted while the incident is ongoing. The choice to purchase or not purchase a retainer should be informed and carefully considered. Many firms, including Dragos, allow customers to leverage their unused retainer hours for other services, like threat hunts or penetration tests.
5. Poor-Quality Incident Response Consultants
Another downside to ad-hoc Incident Response contracting is that not all security firms are created equal. Over the last several years there has been a surge in ransomware attacks which have led to backlogs at many top incident response firms. As a result, many new security companies have opened and advertised incident response services.
Independent of age or size, some of these companies have great talent and provide excellent security services. Others are not prepared to perform incident response properly (particularly in industrial environments). Unfortunately, a poor incident response team can cause almost as much damage as an adversary or a piece of malware. Dragos has responded to incidents where inexperienced incident response firms destroyed evidence, aggressively scanned sensitive industrial devices, and failed to provide industry-standard reporting.
Finding the right incident response firm is a tricky business, but it is important to carefully interview and vet any firm you hire to perform such a critical service (with potential legal implications). Ensure that any incident response team you employ to investigate your OT environment has substantial familiarity with industrial safety and equipment. Feel comfortable with the responders, the explanations for their actions, and the deliverables they contractually agree to provide.
The only thing worse than one incident response effort is having to hire another company to do it all over again.
Read next blog post
Ready to put your insights into action?
Take the next steps and contact our team today.