Eliminating the weakest link: Managing supply chain cybersecurity risk through life cycle

By Ellen Wesselingh

Summary

In recent years, supply chain issues for products with digital elements have increased, which poses a problem for the assurance of the Integrity of these products and the Confidentiality of the data contained in these products. To illustrate the problem, this article provides a number of examples of supply chain attacks that have happened over the last couple of years. The article also provides a model for supply chain risk analysis. This model is based on an existing model from 2013, which is amended with a level of abstraction to ensure the model is as complete as possible. A risk analysis that has been conducted with the support of this model, complemented with an additional analysis of the remaining risk, should provide a sufficient argument that the supply chain is secure enough for its purpose.

Context

This article discusses recent developments in products with digital elements, that may lead to security issues when systems with these elements are deployed. It also discusses possible solutions. One problem is not addressed: misinformation. While a major problem, misinformation makes the product less reliable in the eyes of the beholder. That means it is not the target of this model, because it does not address the security of the system. Misinformation, however, may lead users of the system to less secure choices [1].

It is recommended to use (internationally recognised) standards for interoperability, industry-standard development tools for quality assurance, and to re-use architecture, designs, firmware, and software for cost efficiency and development time reasons [2]. However, since the (re-)use of these components expands the supply chain to a great extent, this leads to a higher possibility of supply chain attacks.

In the past ten years, supply chain problems have become more apparent. This can be seen from examples such as Meltdown and Spectre [3] [4], which have shown hardware architecture choices can introduce the possibility of sophisticated hardware attacks. Kaseya and Solarwinds have shown a similar introduction of attack possibilities through supporting services [5] [6]. Another seminal example is the Dutch case of Diginotar, in which a certificate-issuing service was hacked [7]. These examples have made it clear that hackers can attack potentially interesting targets through their supply chains. This is especially true for high-assurance products that are potentially interesting targets for Advanced Persistent Threats (APTs), also known as state actors. Such targets with much exposure include industries such as the energy, the financial, and the military sector.

Hardware supply chain attacks can be distinguished into two categories. The direct hardware supply chain attacks are executed by actively inserting backdoors and/or trojans in hardware [8] [9], while the indirect hardware supply chain attacks, such as Meltdown and Spectre, result from genuine design decisions with adverse security consequences [3] [4]. Similar considerations have to be made for firmware and software, with the distinction that these may be updated on production systems, whereas that is usually not possible for pure hardware systems such as IC and ASIC.

A target can be the supply chain for direct organisational operations, such as the production supply chain. However indirect supply chains can also be targeted, affecting services that an organization uses for marketing research [10]. Other similarly indirect operations, such as a financial backend, or a Human Resources Management (HRM) system, may be just as vulnerable. The bottom line is that organisations should be aware of all their supply chains and the way these elements may interfere with critical business operations, even when they are perceived as remote elements of the complete operation with all its supply chains.

The primary focus of this article is to develop a supply chain attack framework that addresses the primary process in which a product with IT components is produced. Such products may consist of hardware, firmware, software, a combination of these three, or (development) system information or other product data. Supply chain issues that affect the systems that support the development environment are also addressed.

Secondary services such as the supply chain of other office processes or communication processes are not in scope but may be addressed in future work. The same holds for supply chains of services that are provided to customers.

Methodology used

The research was inspired by a presentation from Andrew Huang at a Blue Hat conference [11]. In his presentation, Huang mentioned a number of attacks [8] [9] for which the articles describing those attacks were analysed for further ideas and references (snowball method). Subsequently, a search for (hardware) supply chain attacks with more generic keywords was conducted to find other supply chain hardware attacks. These attacks were then analysed. No specific search for software issues was done in this phase because of the overwhelming amount of software security problems.

Once a list of relevant hardware supply chain attacks was compiled, a search was conducted to find existing life cycle frameworks and supply chain frameworks. This search yielded a list with possibly relevant frameworks from credible institutions such as ISO, MITRE, NATO, NIST [12] [13] [14] [15] [16]. Once the list was compiled, the most relevant existing framework was chosen. The ISO, NATO, and NIST frameworks mainly address single organisations, and tell how to fix problems in a standardised way. The MITRE framework shows what can go wrong and what the possible solutions can be [13].

The existing framework used a different life cycle model and was therefore transformed into the life cycle model in use at this company. The chosen framework provided a list of attacks in the various life cycle stages and a list of countermeasures that may be taken against the attacks. The new model was amended with an attack in the retirement stage, the new life cycle model is shown in the figure below.

The model was then validated for usability by applying it to an actual supply chain situation from the stakeholder. Following the application, several new countermeasures were added, and the model was introduced to colleagues. It has since been used in actual supply chain situations multiple times.

This article describes how the model was reworked by rearranging the attacks and countermeasures. The framework was developed in several stages. In each stage, the model was validated by peer review with expert colleagues internally. In between those stages, the framework was applied to actual project questions, leading to the validation of the model.

In the course of writing this article, information on supply chain attacks and vulnerabilities for firmware and software was sought. One of the examples that was found is Log4J, an Open Source Software (OSS) component widely used in many IT systems, of which the administrators and users were often unaware that it was used in their applications [17].

Further, an overarching layer of abstraction was added to show more convincingly model completeness. Finally, the model was validated by giving presentations to peers in the field, leading to useful feedback.

Results

The results of the project are threefold:

• A re-worked Supply Chain Attack Framework and Supply Chain Risk Analysis Model (this article)

• A presentation about the model: A previous version of the model was presented at the International Common Criteria Conference 2023, November 2023

• A course on using the model in conjunction with Life Cycle analysis to tackle supply chain issues: A concept course has been developed and is ready for review.

Updated model

The figure below shows the highest abstraction layer of the model. The model contains different layers of abstraction of the product. The development environment is modeled as relevant for all stages of development: standard, architecture, hardware, firmware, and software.

Figure 2: Attack surface categories define the attack categories in which various attack types can be identified. They also provide specific examples of such attack types. Below is a list of attack types and examples in each category:

• Architecture: Attacks as a result of architectural design choices. Examples are Meltdown and Spectre, which may be categorised as either Architecture or Hardware [3] [4].

• Standard: Attacks as a result of vulnerabilities in the standard that is used. An example is Terrestrial Trunked Radio (TETRA). Parts of the TETRA standardised protocol contain vulnerabilities due to government restrictions on the cryptography used [18].

• Hardware: Attacks based on genuine design choices like Meltdown and Spectre. Other examples include attacks based on malicious additions to the design in pre-concept, concept, development, and production stages as described in [3] [4] [8] [9] [19].

• Firmware: Attacks based on genuine design issues, such as the Joint Test Action Group (JTAG) interface. This interface is necessary for testing during development and production, and to provide updates during support [20] [21]. Other examples are attacks based on malicious additions to firmware design in concept, development, production, or utilization/support stages, of which Stuxnet is an example [22].

• Software: Attacks based on genuine design issues such as Log4J, an OSS Java-based logging utility often used in web applications, which had multiple vulnerabilities due to programming issues. Other examples are attacks based on malicious additions to the design in pre-concept, concept, development, production, and utilization/support stages, such as typosquatting or dependency confusion [23].

• Development environment: Attacks on data and/or systems, based on genuine design issues in the supplied systems like Kaseya and Solarwinds. Other examples are attacks based on malicious additions in the supplied systems (backdoors, ransomware, trojans, viruses). These issues may also arise in the production environment. On 29 March 2024, a new attack vector in the development environment emerged: the human factor [24].

The categories, including attack types, are combined with the chosen life cycle model [15]. This life cycle model has seven stages, described below. It is important to note that the stages are not linear.

1. Pre-concept, in which generic (market) research is performed to find the customer needs, requirements, and wishes.

2. Concept, in which a Proof Of Concept (POC) is developed to validate the results of the pre-concept research.

3. Development, in which the POC is developed to a Technological Readiness Level (TRL) for production.

4. Production, in which the developed system is produced and delivered to the customer.

5. Utilisation, in which the system is deployed, this stage incorporates the acceptance and installation.

6. Support, in which the deployed system is being maintained in an operational and secure state.

7. Retirement, in which the deployed system is securely destroyed to prevent persistent data leakage.

Note: the development and production stages run parallel in part, and the stages utilisation and support stages run largely parallel. In the definition of the Common Criteria standard, the production stage is considered part of the family Development Security (ALC_DVS) [12].

For each attack type as listed, they are projected on the life cycle stages. Subsequently, the corresponding countermeasures for each attack are added to the model. All countermeasures are categorised by their applicability to the various attack types.

Combined, the attacks and countermeasures provide a model that can be used for supply chain attack risk analysis. This method is a qualitative method, which means that it does not calculate the risk that an attack is feasible. However, it shows any remaining risk in the supply chain that can either be accepted or mitigated with measures such as insurance. A simplified model is presented in Figure 3: Life cycle with attack type categories. This figure assumes an ideal world in which hardware is developed and produced first. The full model with the details of attack types and countermeasures can be found in the appendix.

The model was further elaborated in multiple steps:

1. Reshuffled the original list of attacks into the newly defined categories Architecture, Standard, Hardware, Firmware, Software, Development environment. Discussed the attack list with expert colleagues. Analysed each attack for essentials such as the entity that is in control when the attack is staged, or whether the attack is staged at control handover in the life cycle or supply chain (which is a vulnerable point and frequently used for the staging of attacks).

2. Re-categorised, combined, and rewrote the original list of attacks into abstract overarching attacks in the newly defined categories, reducing the number of attack descriptions from 42 to 19. Described more specific examples of sub-categories of attacks within the most abstract categories and attributed them to three different problem domains that were defined:

1. Benign (design) decisions that lead to future problems due to insufficient focus on, or insight or knowledge of possible cybersecurity problems arising from those decisions; 2. Genuine mistakes in design or implementation due to insufficient security awareness, lack of security expertise or training;

3. Malicious mistakes or alterations to change the intended functionality of components, systems, or solutions.

The attacks were then mapped to the life cycle model, given the figure below.

Figure 3: Life cycle with attack-type categories

A brief description of the attacks defined is given in the list below. The full model is presented in the appendix.

A01. Standard introduces a sub-optimal proposal for the item being standardised, leading to vulnerabilities that may be exploited.

A02. Descriptions in an Architecture document are poorly explicated or subject to multiple interpretations, leading to vulnerabilities that may be exploited further down the development chain.

A03. In one of the stages concept, development, production, utilisation, support, a genuine component is replaced with a counterfeit component. Counterfeit components are generally of a lesser quality, and thus may introduce security problems, but they are not necessarily malicious.

A04. In one of the stages concept, development, production, utilisation, support, a genuine hardware component is replaced with a maliciously altered component.

A05. During retirement, the system is decommissioned in a way that may cause the leakage of critical data, architecture, or design information.

A06. In one of the stages concept, development, production, utilisation, support, a genuine component is replaced with a counterfeit component. For firmware, this is usually in combination with counterfeit hardware.

A07. In one of the stages concept, development, production, utilisation, support, a genuine firmware component is replaced with a maliciously altered component.

A08. In one of the stages concept, development, production, utilisation, support, one or more genuine software components are replaced with maliciously altered components.

A09. Agents with malicious intent use a weakness in one of the development systems or subsystems that is not patched in time.

A010. Malicious components (either hardware, firmware, or software) are introduced into the development systems.

Figure 4: Life cycle with attacks per stage

The next step was to create a complete set of possible countermeasures. The list of countermeasures provided in [13] was rearranged and more countermeasures were added. Most notably, the measure of secure destruction was added, because the original model did not encompass the retirement stage. An example of such secure destruction is data sanitization, as described in NIST special publication SP800-88 [25].

A number of countermeasures were grouped under a new generic category named Development Security. Common Criteria defines Development Security (ALC_DVS) as follows [12]: “[..] all the physical, procedural, personnel, and other security measures that are necessary to protect the confidentiality and integrity of the [Target of Evaluation (TOE)] design and implementation in its development environment.”

This definition allows for the grouping of very concrete measures such as the application of cryptographic techniques, network access/traffic restrictions, trusted personnel, and secure storage.

An actual life cycle risk analysis starts with the identification of the applicable life cycle stages and the applicable attacks in those stages. Once the applicable attack types and detailed attacks have been identified, the possible countermeasures can be identified. From the possible countermeasures, a selection of realistic options to implement can be chosen. The combined set of countermeasures should be commensurate with the level of protection that is necessary. This level may vary with the criticality of the application for which the product is developed.

In the new model, the countermeasures were mapped to the life cycle stages, as well as to the applicability to counter one or more specific attacks. This is shown in the figure below.

Figure 5: Life cycle with countermeasures

A brief description of the countermeasures defined is given in the list below. The full model is presented in the appendix.

CM01. Development Security: All the physical, procedural, personnel, and other security measures that are necessary to protect the Confidentiality and Integrity of the Target of Evaluation (TOE) design and implementation in its development environment.

CM02. Visual Inspection throughout development and production: To use visual inspection to detect counterfeit components and tampering, in hardware, firmware, software, and documentation. CM03. Supply Chain countermeasures: All measures that make the supply chain visible, and identify weaknesses and attack points in the supply chain.

CM04. Prototyping and production: All measures to assure Integrity of TOE parts that are taken in the process of producing prototypes or final TOEs.

CM05. Delivery Security, including updates during support: All procedures that are necessary to maintain security when distributing versions of the TOE to the consumer.

CM06. Secure Destruction: All procedures and techniques for the destruction of residual information. This may be done in-house, or through contractual agreement with certified notification.

The countermeasures are input for a step-by-step supply chain risk analysis:

1. Sketch a life cycle and if possible, a supply chain model.

2. Determine the applicable categories from Figure 2: Attack surface categories.

3. For the above-determined applicable categories, determine the attacks that are applicable from the list in this appendix.

4. Plot the attacks on the life cycle model.

5. For each attack, discuss with expert colleagues what countermeasures are possible to counter the attack.

6. For all countermeasures, discuss with the relevant supply chain partners what is possible.

7. When all countermeasures that will be applied are known, identify any remaining risk.

8. Within a specific evaluation/certification framework, the risk may be quantified. For example, in the Common Evaluation Methodology, there is a vulnerability analysis, of which the outcome is the calculation of attack potential needed to attack the TOE successfully [26].

The way in which a countermeasure is implemented is not part of the model, as this is the discretion of the developer and its supply chain partners. For example, the countermeasure Multiple Suppliers (see the complete model in the appendix) may be fulfilled by using multiple suppliers for one and the same component, which is the default option in large data and energy networks. However, it can also be fulfilled by choosing different suppliers for different security-critical components.

The supply chain risk analysis may show that there are residual risks that remain even after the application of all realistic countermeasures. In that case, the stakeholders or the customer must decide what to do with the residual risk: risk acceptance, or risk reduction with business processes, in the environment, or other measures.

Discussion and Conclusions

Supply chain attacks are a real risk, especially for high-assurance products. There are multiple types of attacks that can be staged in the supply chain. To counter this problem, risk reduction is necessary. This starts with a risk analysis, for which a supply chain risk analysis model was devised. The developed model is based on a number of well-established standards and sources of expertise, which together have led to a new model as presented in this article. In the process of updating the original model, the new model was validated in various stages and with varying groups with expertise in the field.

The model provides a basis for supply chain risk analysis that is suitable to identify possible attacks and can show how to mitigate these attacks with countermeasures. The exact implementation of the countermeasures is not part of the model.

Contributions

NLNCSA sponsored the research. A number of colleagues contributed greatly by providing useful discussion and insights. Our technical writer did a marvelous job cleaning up the language in the article.

Find all sources and the appendix in the elaborated PDF version.