Tony Turner

EPSS for Configuration Flaws

Security Hardening


Security Hardening

Most organizations have underutilized security capabilities at their fingertips, as vendors frequently add new features that are not enabled by default. Rather than overspending on new security tools, optimize what you have, reducing attack surface and complexity.

Prioritizing security configuration controls

Article content

Comprehensive controls frameworks capture a laudable goal of setting security standards, but create a wasteful effort that may not materialize into security gains. This article explores the need to establish prioritization mechanisms for configuration flaws, and some potential approaches for doing so.

EPSS for Configuration Flaws

The realm of risk-based vulnerability management is vast, as we have seen countless reports of the CVE counts rising year after year. This creates a constant hamster wheel of pain chasing the latest vulnerability du jour, but recent years have seen a number of prioritization mechanisms such as EPSS that help to alleviate the pressure. Common sense has started to take hold, but the area of security hardening and configuration management is still largely driven by compliance, and not any actual risk reduction. These too, are vulnerabilities that must be prioritized in order to make effective use of the resources we have available.

Traditionally, when we are thinking about configuration flaws, we are talking about those features that the product supplier has provided us to secure their product. These are built-in options to secure their product, and hopefully come enabled by default, but frequently they are shipped insecurely, but can be configured to become secure. The configuration of software components that fall outside of the product suppliers control, however, need to be considered independently. For instance, if you have an existing MSSQL installation supporting another application, and deploy a suppliers product to use that existing installation, it probably will not modify any defaults that are not normal defaults for MSSQL that your prior configuration has modified.

The Universe of Theoretical Flaws

Compliance suffers from a fatal flaw, lack of context. Standards drafting teams don't know what your risk environment looks like, so they try to imagine all sorts of possible issues. There's a collective grouping of subject matter experts, all coming from different backgrounds and perspectives. Some with very theoretical views on possible risks, such as that one time when a researcher did a terrible thing in their white paper, and surely we must have a protection from that. You also have people that have responded to actual incidents who ensure those weaknesses are captured as well. But for the reader of the ultimately produced standard, it is unclear what led to the control being included. How can you use a risk-based approach to that?!

If we look at one of the most well-known and implemented frameworks, NIST 800-53, we will find that there are almost 1200 controls, and this is but one of many standards that organizations are faced with. Every industry they engage with has their own set of standards such as NERC CIP, FDA, DHS, DoD and the list goes on. There are also many optional controls such as those from Cloud Security Alliance, OWASP, Center for Internet Security (CIS) and many more.

For some organizations, they may not have any choice but to implement these frameworks due to regulatory and/or legal requirements, but often the scope of such programs can be closely managed and controlled. For instance, NERC CIP only applies to regulated assets and processes impacting the regulated environment. In many electric power organizations subject to such controls, that might constitute less than 5% of their assets. So while onerous, enterprise networks probably don't need to comply with NERC CIP.

So then security conscious organizations move beyond compliance to look at unregulated areas and decide how to control them. NIST CSF is a popular approach, so as organizations seek to implement it they look at areas such as Protect which calls for hardened baselines, but what does that mean? CSF does not tell how to configure your system, simply that it needs to be done. This is where CIS Benchmarks and DISA STIGS come into play. These are fantastic resources that encompass all sorts of configuration controls.

The CIS Benchmark for Windows 11 has over 600 listed configuration controls. These are mostly great things to implement, as long as they don't prevent legitimate business usage, some may. For instance, I've seen some environments that tried to restrict use of Java and then many business units could not actually do their job.

One example I use frequently, is that of TLS configurations. While not having transport encryption at all would be a major issue, getting very concerned about the TLS versions in use is probably not the best use of security spend. I can't think of anyone who ever got breached because they used TLS 1.1 instead of 1.2. Yet this is one of the most common issue we hear that needs to be addressed. Another example, that while not a problem for most, may create accessibility concerns is requiring CTRL-ALT-DELETE for logon events. The idea is that a trojan could mimic the Windows login screen and intercept a password. If a trojan is installed, it can probably intercept keystrokes too. And the chance that this is going to protect you from a determined adversary is negligible.

Penetration Test Findings

This one is really interesting to me, especially since just last week I published the first in a new series of articles titled "5 Times I Gained Admin Due to Misconfiguration". At least with this category of flaws, we know that somebody was able to perform undesirable actions due to missing controls. But regardless of what anyone tells you, the way a penetration tester does their job is not the same way a real adversary will in most cases.

It's also noteworthy that when you try to map MITRE ATT&CK to penetration testing actions (as I initially tried to in that blog but gave up) you will find the things commonly used by penetration testers do not always have a good mapping. A great example of that is NAC bypass, there's just not a good match in ATT&CK for this. You can use things like T1556/004 Modify Authentication Process, but it's not a great fit. And if you consider where most of the TTPs came from, it begs the question of whether it's something an attacker will do or is it just something my penetration tester will do? I don't have a great answer here, as an industry we'd need more data to answer that question. But clearly issues like LLMNR are obviously used by adversaries, and ATT&CK maps very clearly there. This is one of the topics I want to explore further.

So why do adversaries engage differently? Well, nobody is establishing a scope or rules of engagement for them. We could probably cover this topic alone pretty deeply in a separate article, but there are economics to this as well. As often as we hear about sophisticated attacks, most are purely attacks of opportunity, and the adversary is not trying to "wow" their customer with a crazy 0-day they just dropped or sell more pentests. They have an objective they are trying to achieve with the least amount of resources, and maybe that objective is to compromise as many organizations as possible with a single play from their playbook. It's not sophisticated, if it works, great, but if not they may just move on. As defenders, our job is to make sure they look for easier targets, not to close every possible hole. The challenge is to find the 4 or 5 really important configuration weaknesses we need to close, not all 600+.

But as penetration testers, often our objective is not well defined at all. Too often I hear I'm supposed to "get in" or "gain domain admin", as opposed to "find a user who will click the link and install ransomware". We don't scope penetration tests this way. I wish we did, they'd be far more accurate. I guarantee they'd make the organization think twice about how good their security posture is. But at least they are using TLS 1.2, so they must be secure, right?

Exploited Flaws

This is where things get really interesting. It's not often we get really thorough writeups on impactful incidents, but when we do we should really be paying attention. For instance, this excellent writeup on the Microsoft Storm-0558 compromise discusses how a crash dump containing a secret key from production was moved to an assumed lower security debugging environment where it was stolen from a compromised engineers account. This was actually a pretty sophisticated attack. But what we can learn from this is that crash dumps contain really useful information, and we need to ensure that sensitive details like secret keys get protected like any other secrets. So controls that implemented robust security in the debug environment, or prevented keys from being stored in the dump file or additional data security measures for the dumps could have helped. Obviously also looking at the infection vector for the engineer as well.

But not everything is this sophisticated, for instance we've seen major attacks such as Solarwinds in 2020 where it spawned from a weak password that had elevated access. While I would contend that trying to blame an intern from setting a weak password (solarwinds123) on an account with update server access is a pretty terrible look, we can consider the fact that it obviously was not being audited, not using 2 factor, updates did not require mutual agreement between developers and a host of other configuration issues that led to one of the largest supply chain attacks of all time. No CTRL-ALT-DEL weaknesses here! But one common thread we can see here, is protecting credentialed access. In fact, if we look at the Verizon Data Breach Investigation Report for 2023, the method of access for adversaries was overwhelmingly through stolen credentials, with Phishing a distant 2nd, and vulnerability exploitation much lower even than that. So perhaps we should be focusing on password policies, secrets storage, auditing, multi-factor authentication and similar.

Prioritizing Flaws

If we take lessons from the vulnerability prioritization space, there's really a handful of techniques that can be used, though we need to tailor them for the unique nature of configuration weaknesses. For instance, we don't have CVSS scores so we need a way to measure severity another way. We will get into this below, but it follows a funnel-like prioritization flow. You may skip some of these layers, or re-order them to suit your process, or add additional layers as more specificity is required, but this is the general process we will explore.


To start with, as we discussed there are some things you just have to do. We won't belabor the point here, except to say that you might need to have two separate processes. I've seen this in many highly regulated organizations such as electric power ($1Million/day/violation fine is certainly a major risk!) where they run everything through their compliance process first and catch anything that is going to create problems for them. I'm not crazy about it, but I'm also a realist and know this also comes with financial and reputation risk that may have nothing to do with cyber, but it IS enterprise risk management all the same. Once the compliance issues are out of the way, they then run through their risk process.

Severity Scoring

Since we don't have a CVSS score to go on, how do we measure the technical impact of the flaw? We do get pretty good descriptions of the justification of the control from CIS, but it's a qualitative description and it's often very theoretical in it's wording. In 2010, this paper on Common Configuration Scoring System by Peter M. Mell and Karen Scarfone, modeled after CVSS was proposed, but to this date I have yet to see it in active use anywhere.

Common Configuration Scoring System

CCSS uses a very similar approach of Exploitability and Impact measures in it's base scoring model.


Using what is a very familiar Access Vector and Authentication, this will make sense to anyone with experience with CVSS. The one major difference being the use of Exploitation Method (not shown below) with the possible values of Active or Passive. This is interesting because it speaks to the danger or benefit caused by simply existing, as opposed to needing an adversary to perform an intentional action. One of the examples used in the paper is that of audit controls. Lack of auditing creates an issue for the defender, regardless of what an adversary does, therefore this is a passive configuration flaw.


From an Impact standpoint, it's pretty typical CIA factors as seen in CVSS and as you will see in my example below, I wound up doing something similar as well. For ICS oriented environments I've often used other attributes, such as Safety, Reliability and Productivity,

One approach I've taken, and it's something we do in our software platform, is to automate inherent risk analysis for the product in question. For instance a server operating system or SCADA server, might all be very high risk, while your marketing software is not (typically). By leveraging a product category based model, we can deprioritize many lower importance products for configuration management until we are ready for them. We are leveraging 17 inherent risk factors from the NIST definition of Critical Software, as well as OWASP Risk Rating and some proprietary factors also.

But this still does not help us boil the ocean of 600+ controls in Windows 11. One way I've handled this is to identify key risk drivers and do keyword searches for those items to start with. For instance, searching for authentication, credential, password, token, etc could help you hone in very quickly on those controls that enhance the access controls for the product you are securing. It's not perfect, but really when getting started on the work, it's not a terrible place to start to handle the low hanging fruit.

There are of course great ways to handle mass-scale configuration too. One of my first security roles in the early 2000s was securing Windows XP across a large state government organization, and we used CIS to identify the build and enforced it with Windows Group Policy (after testing) and also in our default gold images we deployed with Norton Ghost. But in hindsight, there were many controls we implemented that caused headaches for IT and really did not produce much security benefit. We generated a lot of noise and busy work for marginal value.

So while my approach of using inherent risk is one way to reduce the scope of work a bit, I think as an industry we need to go quite a bit further before we can understand the severity behind configuration flaws. At a former customer, I designed a configuration management scoring system that was based on OWASP Risk Rating, which was also what we used for vulnerability management. Basically the approach we used was creating a risk modeling template for each CIS benchmark that started by identifying the flaw that configuration control mitigated. Many times this was just an inverse of the control, such that a strong password policy mitigated the flaw of weak passwords. With a flaw we can measure in hand, we captured asset business impact for the related product to asset aligned to Compliance, Privacy, Safety, Reputation and Financial impacts at the asset level (and therefore the product on that asset inherited those risks), evaluated Confidentiality, Integrity and Availability Impacts, and the ability for a threat actor to influence the configuration weakness. While this was a pretty effective scoring mechanism, it did not scale terribly well since each product needed it own policy defined, which typically took a couple days to create. When I left that project we only had 20 or so products defined, out of thousands in the environment. But it was a great start.

You can also find other similar and expanded approaches such as IACS STAR from Cutaway Security that are useful within ICS as well. It's not uncommon to see new approaches emerging, but typically they are focused on vulnerability prioritization.

What is really needed though, is a catalog of this data that be leveraged in analysis and driven by quantitative process. To my knowledge, such a dataset does not exist, but we are working on a project to try to capture this that we plan to open source. More on that in a future post.

Can AI Help?

One thing I've been exploring lately is use of AI to provide a mechanism to evaluate these large documents like the Windows 11 benchmark referenced above. For instance, a recent query to ChatGPT asking for the most important 5 categories resulted in this prioritization list:

  1. Password Policies: Enforcing strong password requirements to enhance account security.
  2. User Access Controls: Restricting user privileges and access rights to reduce the risk of unauthorized access.
  3. Firewall Configurations: Ensuring appropriate firewall settings to protect against external threats.
  4. Audit Policies: Implementing robust auditing to monitor and record system activities for detecting and responding to potential security incidents.
  5. System Updates and Patch Management: Regularly updating the operating system and applications to protect against known vulnerabilities.

OK, here's an interesting starting point, but that does not quite get me close enough. But asking specifically about Password Policies results in:

  1. Enforce Password History: Set to remember 24 or more passwords. This ensures that when passwords are changed, they cannot be immediately reused, thus preventing the cyclic use of a small set of passwords​​.
  2. Maximum Password Age: Set to 365 or fewer days, but not 0. This control requires users to change their passwords at regular intervals, reducing the risk of compromised passwords being used indefinitely​​.
  3. Minimum Password Age: Set to 1 or more days. This prevents users from changing their passwords multiple times in a short period to circumvent the password history requirement​​.
  4. Minimum Password Length: Set to 14 or more characters. Longer passwords are generally more secure against brute-force attacks​​.
  5. Password Must Meet Complexity Requirements: Enabled. This ensures that passwords contain a mix of character types, such as uppercase and lowercase letters, digits, and special characters. The policy also includes that passwords should not contain the user's account name or parts of the user's full name that exceed two consecutive characters​​.
  6. Store Passwords Using Reversible Encryption: Set to 'Disabled'. This prevents the storage of passwords in a format that can be easily converted back to plaintext, thus enhancing their security​​.

While still not a severity score, this is all getting us closer. If we assume that each of the top 5 categories has 6 controls, we've gone from 600 controls to 30. That is far more manageable. But is this a useful method of prioritization? This one bears further discussion, as my gut tells me its an improvement over analysis paralysis, but it's not truly a data-driven or risk-based approach.

Reachability Analysis

Reachability is really about exposure. Is our flaw exposed in such a way that it can be exploited? For software vulnerabilities, this typically takes the form of understanding reachability of vulnerable functions. is the vulnerability accessible in code or by a user of that software? You also have vendors that model attack path behavior at the network layer to determine if the network even allows access to the vulnerable service.

Years ago when I ran Professional Services for the Americas at Skybox Security, the launch of our Vulnerability Control product was so exciting for the industry because we utilized an attack path approach to prioritization. The idea was that since we already had context of the network pathing from Network Assurance, including routing tables, ACLs and other network pathing information, that we could eliminate the need to patch vulnerabilities that were not accessible by defined threats. I personally worked incidents in my time there that resulted from defenders not understanding how their networks routed (attacker's don't use the default gateway if they don't want to). What was once known as "attack path" is now being talked about in context of reachability.

Reachability analysis has existed for years, but it's really only been since the advent of software bill of materials (SBOM) and the resultant flood of vulnerability false positives that reachability has entered widespread discussion. The idea is that you might have a vulnerable function in an open source library, that is never actually called from code. it is not reachable. There is little to no risk of that vulnerability being exploited. Performing reachability analysis therefore gives us a great mechanism to eliminate vulnerabilities from our scope and reduce the operational chaos from dealing with them.

But how do we perform reachability analysis on a configuration flaw? How is that flaw reached by an adversary? Is it indirectly accessed due to how the configuration control manages other aspects of the system? For instance, if I turn on exploit mitigation protections on an operating system, that could influence entire classes of software weaknesses or CWEs. There is not one single instance to score, but literally every possible exploitation that could take advantage of the missing mitigation. How do I measure that? It's probably going to bubble to the top of my prioritization list.

Again, this comes back to our need for a catalog of controls, and probably some way to map configuration controls to CVEs and CVE impacts.

Threat Intelligence and Exploitability

Probably the most significant advance in vulnerability prioritization in recent years is EPSS, Exploit Prediction Scoring System. There are many articles on the internet about EPSS, including this one from Claroty. But to summarize, EPSS leverages a data-driven approach using a variety of evolving threat indicators to determine the probability of exploitation as a percentile score and publishes this information in an open API. This analysis is performed daily, and as the indicators change, so too does the score. It is a dynamic and evolving method to determine which vulnerabilities are most likely to be exploited by adversaries in the next 30 days. There are many security vendors that now support EPSS, including Opswright in our Impact Platform.

The topic of threat-informed defense has spawned entire product categories and topics, and the use of threat intelligence has been used for prioritization by vulnerability management tools for decades. That at least is not a new trend. But the quantitative nature of analysis that is performed by the FIRST team and their partners at the Cyentia Institute, is what makes this such a novel approach. The challenge comes down to the temporal factors and dependencies in prioritization. To do data-driven analysis, we need data to do analysis. Vulnerabilities are being discovered at a rapid pace, and the data around them and how they are being exploited changes daily. So too do the resultant analysis of those data points. EPSS is no different, and could be thought of as a lagging indicator, but if you are seeing EPSS scores telling you to patch, you need to do it NOW.

So how does this help us with security configuration flaws? CWE has an entire category that covers this topic, but this is really about how software misconfiguration leads to vulnerabilities. It's not about whether you disabled LLMNR on your Windows machine or not. So instead, what we wind up with is a reactive approach in advisories such as this one from CISA, advising you to get your web admin interfaces off the internet. It's great work by CISA, but this is a very reactive and piecemeal approach. How can I use threat intelligence to be a bit more proactive?

Again, we come back to the concept of a configuration controls catalog. If we can classify control categories, we can start to prioritize based on the risk that not having those controls presents. MITRE ATT&CK provides a view into the mitigations associated with TTPs, but if we look to the work by it's sister project, D3FEND, this starts to get us a bit closer. What is very interesting about D3FEND, is the mapping relationships and the visual explorer provided at their page. Being able to understand how a specific control hardens, dete3cts, deceives, etc starts to paint a bit clearer picture. But we still don't have a quantitative analysis. How do we manage this at scale?

One approach is using the MITRE Top Attacks calculator tool where you can provide your controls and general posture and then calculate a list of techniques most likely to create a bad day for you. You can then iterate and prioritize the related mitigations. While this is a great tool, it still does not quite get us to what we need in a data catalog that maps controls, categories, threat and vulnerability impacts, etc. MITRE D$FEND is probably the closest since we can start to prioritize defensive categories. For instance, I'd prefer to apply rigorous hardening personally, and certainly detect, before worrying too much about deceive.

Business Impact

This is arguably one of the hardest parts of this process to implement at scale. I've been designing and building vulnerability management systems to tackle this issue for decades, and before that I was deeply involved in disaster recover and business continuity where this is just table stakes. One of the greatest data sources that cybersecurity fails to take advantage of in most programs is the Business Impact Analysis. This is where the organization defines their critical functions and starts identifying the people, process, technology and equipment dependencies for that critical function. It's often external parties, and from a cyber standpoint, this is really what we are talking about when we discuss the supply chain security risk. In the graphic above, all those flaws you don't control, are largely a result of technology providers outside your scope of control or service providers that engage in unsafe processes. But step one is just gaining an understanding of how all of this maps to your business risk.

EPSS (or something else) for Configuration Flaws

It's clear we need to understand a better way to prioritize our configuration flaws. We've discussed several current approach. All are good for what they were designed for, but none really help to prioritize the critical mass of configuration controls. What is the 80/20 for hardening? Where do I get the most value? It's going to be different for you than it is for me, but I'd welcome your feedback in what you'd want to see in such a catalog. And if you'd like to collaborate on such an effort, please get in touch!

EPSS for Configuration Flaws

Tony Turner

Founder, CEO

Experienced cybersecurity executive 30+ years, Author of SANS SEC547 Defending Product Supply Chains and Software Transparency.

Author's page