Ubuntu Security Podcast 204

Last updated:


Ubuntu Security Podcast Episode

Paper

Federated Learning (FL) is a collaborative machine learning approach allowing participants to jointly train a model without having to share their private, potentially sensitive local datasets with others. Despite its benefits, FL is vulnerable to so-called backdoor attacks, in which an adversary injects manipulated model updates into the federated model aggregation process so that the resulting model will provide targeted false predictions for specific adversary-chosen inputs. Proposed defenses against backdoor attacks based on detecting and filtering out malicious model updates consider only very specific and limited attacker models, whereas defenses based on differential privacy-inspired noise injection significantly deteriorate the benign performance of the aggregated model. To address these deficiencies, we introduce FLAME, a defense framework that estimates the sufficient amount of noise to be injected to ensure the elimination of backdoors. To minimize the required amount of noise, FLAME uses a model clustering and weight clipping approach. This ensures that FLAME can maintain the benign performance of the aggregated model while effectively eliminating adversarial backdoors. Our evaluation of FLAME on several datasets stemming from application areas including image classification, word prediction, and IoT intrusion detection demonstrates that FLAME removes backdoors effectively with a negligible impact on the benign performance of the models.

Transcript

Hey, Alex! Thanks for your introduction.

For those of you who are new here, I’m Andrei, and the purpose of this section is to explain cybersecurity papers. I hope that the informative content offered here may catalyze your work, whether you are in industry or academia.

In this episode, I want to continue our discussion of the intersection between cybersecurity and machine learning, which we began in episode 194. And, no, despite what you may be thinking, we will not investigate the security of large language models. We will have to leave that intriguing topic for a future edition.

Today’s paper

The paper for today is titled "FLAME: Taming Backdoors in Federated Learning", and it will demonstrate how to safeguard a distributed machine learning infrastructure in which a model is trained separately by many agents using their private data.

The work was first published in arXiv in January 2021, although it was revised several times until its final revision in December 2022. It was first presented at the USENIX Security Symposium last year. Its authors are academics from universities in Germany, Finland, and the United States.

Federated learning (and a use case)

Let’s assume we have a company called Evil Corp that provides incident response services to a variety of other businesses. The analysts are divided into teams, and each team only works with one customer company. Each analyst examines information (such as logs in a SIEM or files flagged as harmful by third party solutions) and makes a decision on how that information or alert should be actioned (for example isolating a workstation). The information gathered during the incident response process must be kept private and not shared by the analyst with anyone else at Evil Corp.

Following a meeting of Evil Corp's technical management, it was concluded that a machine learning model to help analysts in their incident response process would be beneficial. But, given the previously mentioned confidentiality need, is this even possible?

One possible solution to this concern would be to use a machine learning technique known as federated learning. The reason this technique can be of help is that it permits combining several individual models trained by agents on their private datasets to create a centralized model. In the context of artificial intelligence, an agent is defined as an independent system that observes and acts on its environment.

In our incident response scenario, each event in which the incident responder took an action will be recorded in a private dataset, with a description of the action and the evidence considered while making a decision. This information will be used to train a local machine learning model on the customer premises, which will then be uploaded to EvilCorp. An aggregator will then use all of the obtained local models to generate a global model.

Clients will profit from the global model in this way since any analyst reacting to incidents for them will utilize it to make more informed decisions. At the same time, the customer's data will stay private, as it will not cross the bounds of their infrastructure.

Attacks

At this point, it is important to ask if the strategy of implementing federated learning to help provide confidentiality in our scenario is bulletproof. Well, you might not be surprised that there are numerous attacks that are still possible even with this decentralized approach.

To begin, the aggregation process that combines the individual local models assigns each agent a weight. If an agent trained its model on the largest dataset available, the weight of its model in the final model should be the highest. The most basic form is a weighted average, where the weight is calculated by dividing the dataset size of each agent by the sum of all dataset sizes.  If a malicious actor were able to access or influence a local model,  they might be able to claim that they trained that model on a massive number of entries in order to influence the final global model as much as possible.

In other types of attacks, adversaries may attempt to infer data from a trained model. If the resulting global model is shared with the agents, an attacker may attempt to find the presence of a training sample (known as a membership inference attack), a property of a training sample (known as a property inference attack), or the proportion of samples of a specific class in the data (known as a distribution estimation attack).

Finally, if a malicious actor were able to train their own local models, they could poison the data used for training or the model itself. The purpose may be to simply degrade the model's performance, akin to a denial of service attack, or to conduct more focused attacks. An example would be an adversary attempting to alter the resulting global model in such a way that attacker-controlled inputs (referred to as the trigger set) result in certain inaccurate predictions that are selected by the adversary. The rest of the data is accurately predicted.

These last attacks are known as backdoors on federated learning models, and they are intended to be defeated by the following mitigation techniques that are presented in this research paper.

FLAME

FLAME is a resilient aggregation system that aims to eliminate the impact of backdoor attacks while retaining the performance of benign predictions.

The study broadens the concept of a trained model by defining it solely with a set of weight vectors. As a result, each model may be distinguished by two characteristics: its direction (specifically, its angle) and its magnitude (particularly, its length).

Adversary Model

To better understand the mitigation techniques that FLAME is proposing, let’s discuss the adversary model. Being generic, it makes no strong assumptions regarding the attack approach or the underlying distribution of data in benign or adversarial datasets. The study presents a generic model made up of attributes that may describe an attacker but do not have fixed values.

The adversary's purpose is to carry out a backdoor attack, in which the model is tricked into generating inaccurate predictions for values in a trigger set and the correct value otherwise.

Furthermore, the attacker is thought to control a part of the agents, which is less than half of the entire number. The Poisoned Model Rate, which is equivalent to the fraction of agents in the federated architecture controlled by the attacker, is introduced here.

The attacker's final characteristic is that he does not want to be discovered. The difference between a poisoned model and a benign model, calculated for example using the Euclidean distance, should be less than a threshold, which is referred to as the anomaly detector's discrimination capability.

Defense Objectives

With this threat model in mind, the authors deduced what the goals of a protection against backdoor attacks should be. To begin with, it must be effective: the defense must prevent the attackers from carrying out their attacks. Second, the model's performance in benign forecasts should be damaged as little as possible. Finally, the mitigation should be independent of the data distribution and attack technique, with no assumptions made about the data being ingested or the attacker attempting to carry out the backdoor attack.

Phases of FLAME

FLAME attains these goals through three phases.

The first is automated model clustering. It assists in identifying and eliminating harmful models with large angular deviation, which have the most impact. The number of clusters is determined dynamically because the attack settings are unknown. Having a set number of clusters makes the model subject to pigeonhole attacks, which means an attacker may insert more backdoor values than clusters - at least one will be clustered and introduced in the final model.

The next stage is weights clipping, which involves limiting the maximum magnitude of a weight vector. The clipping value is computed dynamically because selecting a too small of a number will cause the benign values to be clipped, affecting model performance. And if a value is chosen that is too large, the backdoor values may still enter the final global model.

Finally, noise is used in a manner similar to that of differential privacy. It eliminates any remaining backdoors that remained after the previous two stages, especially those with an appropriate angular deviation such that they are clustered and an appropriate magnitude such that they are not clipped.

The quantity of noise produced is dynamic because it should be large enough to remove the backdoors but not so high that it affects the performance of benign predictions, which are based on an uncertain data distribution.

Results

The researchers employed PyTorch as a deep learning framework to evaluate the proposed mitigation. The authors use existing source code for the state-of-the-art attacks, but they have to create new implementations of current defenses that can be compared to FLAME.

Three datasets have been used for word prediction, image classification, and IoT intrusion detection.

The average outcome for backdoor attacks was encouraging and better than 6 other known defenses: the accuracy in predicting the benign values was 99.8%, and all backdoors were effectively detected. The only time an alternative defense outperformed FLAME was when the other was highly specialized on the sort of data being injected, notably statistically non-independent and non-identical distributions.

Conclusion

With this knowledge in mind, I'd like to conclude this episode. We've shown how federated learning can be used to train machine learning models in a distributed situation where the entity aggregating the local model does not possess access to the local, private datasets. The paper presented today, namely "FLAME: Taming Backdoors in Federated Learning", proposes a three-phased robust aggregation approach that eliminates backdoor attacks while preserving prediction accuracy in benign conditions.

If you come across any interesting academic topics that you think should be discussed in this podcast so that the community can learn more about them, please email us at [email protected].

The floor is yours, Alex!