Query Focused Datasets

By: Dan Gunter & Justin Cavinee

Introducing QFDs

Query focused datasets (QFDs) provide analysts with powerful tools for both proactive threat hunts and investigations. We will first focus on what a QFD is, then how analysts can leverage QFDs to find needed information quickly. The Dragos Platform provides default QFDs associated with the threat behavior analytics within content packs. The QFDs supplied by the content packs focus on the amplification of data for the playbooks and threat behavior analytics.

What is a QFD?

Simply put, a query focused dataset is a pared down dataset that combines disparate data to enable analysts to prove or disprove a given hypothesis quickly. While a QFD is a subset of a larger dataset, the QFD might contain additional enriched information that provides analysts with an optimized view of the situation in question. QFDs normalize data and reduce the overall time analysts must spend when triaging suspicious activity or threat hunting.

Consider an example where an analyst begins to investigate suspicious login activity to network resources. In a network with low defensive maturity, the analyst might have to connect to many devices to collect logs. The analyst must then either manually search through the logs or manually use additional tools to search through the data. With a QFD, the analyst views a curated dataset where the processing steps have already taken place. In this situation, a QFD frees the analyst’s time, enabling a more in-depth analysis of the activity in question. QFDs ultimately afford incident response processes additional scalability.

The power of QFDs expands beyond data aggregation. Traditional security information and event management (SIEM) products currently aggregate data. Where QFDs extend beyond SIEM capabilities comes with the normalization and correlation of data. A SIEM might independently index the information from Windows Remote Desktop Protocol (RDP) sessions and Virtual Network Computing (VNC) in separate tables of the SIEM’s database. A QFD receives the same data, enriches the data with other network events, and uniformly displays the data. The uniformity of the data enables easier searching across the dataset and ultimately delivers information to the analyst quicker. In the case of looking at remote sessions such as RDP and VNC, an analyst might decide to use remote session logins tied to building access logs to investigate stolen user credentials. A user likely shouldn’t have an active remote access session from an external source while the individual is also on the premise of a given site.

When are QFDs Useful?

QFDs pair well with threat behavior analytics and playbooks in providing context to alerts or network behavior. When a notification alerts that a given threat behavior occurred on the network, QFDs quickly offer meaning behind the observed behavior. When we design playbooks and threat behavior analytics, we also try to provide a QFD as a source of information to enable analysts to find relevant information quickly. Let’s now look at a QFD and talk through how an analyst can leverage the features of the QFD.

The image above shows the QFDs currently loaded. On this particular instance of the platform, we have QFDs to view tags present in the environment, see protocol behavior statistics, observe user agents and analyze the behaviors of specific industrial protocols. We are going to look closer at the server stats QFD.

The server stats QFD above shows statistics on the protocols seen transiting to and from assets. General information about the number of packets and bytes sent to and from a device are quickly available. An enrichment provided by this QFD is the producer-consumer ratio field. Producer-consumer ratio (PCR) is a data exfiltration detection technique presented at FloCon 2014 by Carter Bullard of QoSient and John Gerth of Stanford University. Bullard and Gerth’s hypothesis focuses on looking at changes in the ratio of how much a given host produces for other hosts on the network vs. how much a given host consumes concerning a specific protocol. While the proportion will vary between protocols, this technique applies to both industrial and traditional IT protocols. Devices that only act as a producer or use a protocol to beacon traffic out will have a PCR closer to 1.0, while devices that only operate in a consumer role like an HTTP download will have a PCR closer to -1.0. Within industrial protocols, PCR can be used to examine the relationships between master and slave devices.

To give a concrete example of this in use, we’ll dive further into the “Living off the Land” techniques described by Joe Slowik in “Threat Analytics and Activity Groups.” The Electrum activity group is known to take advantage of existing administrative tools, such as PSExec and “net use.” With proper logging and analytics, this behavior can be detected, and the appropriate “investigative playbook” can be launched to investigate the activity. One step of the investigation is to determine if there has been any data exfiltration, how much, and to which server(s). One approach is to start by finding all of the external servers that the suspect assets are connected to, scope the ServerStats QFD to those assets, and then sort by the PCR value.

Asset #125337 immediately sticks out, because despite being an SSL Server (Typical PCR near +1), it has a PCR of -.99, which is often indicative of an exfiltration server. The QFD also shows us when the communication was first seen and how much data has been sent to the server. Suspicions aroused, let’s continue to the next step of the playbook, which pivots to the certificate QFD and shows that the potential exfiltration server is using a self-signed X509 cert. These facts, combined with the previous behaviors, likely result in escalating this case to an incident and triggering an incident response activity.

Moving Forward

In addition to the QFDs provided by regular content pack releases, the Dragos Platform also provides the opportunity for end users to generate custom QFDs. Custom QFDs afford the ability to develop site-customized views into the data collected by the Dragos Platform and to integrate better with the realities of your specific environment and procedures.

The specificity and unique nature of industrial control system environments afford unique opportunities to use QFDs during threat hunting and incident response situations. Unlike many IT networks, industrial networks are more purpose-driven and split into segments with well-defined roles. QFDs should focus on looking at the individual device behaviors and characteristics, as well as looking for activity group tactics, techniques, and procedures (TTPs) from threat intelligence sources. Adversary TTPs are living processes continually being improved and updated; so, too, should QFDs continue to be developed and refined to counter emerging threats.

[1] https://resources.sei.cmu.edu/asset_files/Presentation/2014_017_001_90063.pdf