Ubuntu Security Podcast 221

Last updated: Sat 09 March 2024

Ubuntu Security Podcast Episode

Paper

Paper: "Bad Snakes: Understanding and Improving Python Package Index Malware Scanning"
Abstract

Open-source, community-driven package repositories see thousands of malware packages each year, but do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the requirements for deploying such malware scanners via a case study of the Python ecosystem and PyPI repository, including interviews with administrators and maintainers. Further, we evaluate existing malware detection techniques for deployment in this setting by creating a benchmark dataset and comparing several existing tools: the malware checks implemented in PyPI, Bandit4Mal, and OSSGadget's OSS Detect Backdoor. We find that repository administrators have exacting requirements for such malware detection tools. Specifically, they consider a false positive rate of even 0.1% to be unacceptably high, given the large number of package releases that might trigger false alerts. Measured tools have false positive rates between 15% and 97%; increasing thresholds for detection rules to reduce this rate renders the true positive rate useless. While automated tools are far from reaching these demands, we find that a socio-technical malware detection system has emerged to meet these needs: external security researchers perform repository malware scans, filter for useful results, and report the results to repository administrators. These parties face different incentives and constraints on their time and tooling. We conclude with recommendations for improving detection capabilities and strengthening the collaboration between security researchers and software repository administrators.

CCS concepts (unofficial)
- Security and privacy - Intrusion/anomaly detection and malware mitigation - Malware and its mitigation
- Software and its engineering - Software notations and tools - Software libraries and repositories
Publishing date: July 2023
PDF: ACM

Transcript

Hey, Alex!

We will continue our journey today beyond the scope of the previous episodes. We've delved into the realms of network security, federated infrastructures, and vulnerability detection and assessment.

Today’s paper

Last year, the Ubuntu Security Team participated in the Linux Security Summit in Bilbao. At that time, I managed to have a discussion with Zach, who hosted a presentation at the Supply Chain Security Con entitled “Will Large-Scale Automated Scanning Stop Malware on OSS Repositories?”. I later discovered that his talk was backed by a paper that he and his colleagues from Chainguard had published.

With this in mind, today we will be examining “Bad Snakes: Understanding and Improving Python Package Index Malware Scanning”, which was published last year in ACM’s International Conference on Software Engineering.

The aim of the paper is to highlight the current state of the Python and PyPi ecosystems from a malware detection standpoint, identify the requirements for a mature malware scanner that can be integrated into PyPi, and ascertain whether the existing open-source tools meet these objectives.

Repositories. PyPi

With this in mind, let's start by understanding the context.

Applications can be distributed through repositories. This means that the applications are packaged into a generic format and published in either managed or unmanaged repositories. Users can then install the application by querying the repositories, downloading the application in a format that they can unpack through a client, and subsequently run on their hosts.

There are numerous repositories out there. Some target specific operating systems, as is the case with Debian repositories, the Snap Store, Google Play, or the Microsoft Store. Others are designed to store packages for a specific programming language, such as PyPi, npm, and RubyGems. Firefox Add-ons and the Chrome extension store target a specific platform, namely the browser.

Another relevant characteristic when discussing repositories is the level of curation. The Ubuntu Archive is considered a curated repository of software packages because there are several trustworthy contributors able to publish software within the repository. Conversely, npm is unmanaged because any member of the open-source community can publish anything in it.

We will discuss the Python Package Index extensively, which is the de facto unmanaged repository for the Python programming language. As of the 7th of March 2024, there were 5.4 million releases for 520 thousand projects and nearly 800 thousand users. It is governed by a non-profit organisation and run by volunteers worldwide.

Supply chain attacks

Software repositories foster the dependencies of software on other pieces of software, controlled by different parties. As seen in campaigns such as the SolarWinds SUNBURST attack, this can go awry. Attackers can gain control over software in a company's supply chain, gain initial access to their infrastructure, and exploit this advantage.

Multiple attack vectors are possible. Accounts can be hijacked. Attackers may publish packages with similar names (in a tactic known as typosquatting). They can also leverage shrink-wrapped clones, which are duplicates of existing packages, where malicious code is injected after gaining users' trust. While covering all attack vectors is beyond the scope of this podcast episode, you can find a comprehensive taxonomy in a paper called “Taxonomy of Attacks on Open-Source Software Supply Chains”, which lists over 100 unique attack vectors.

From 2017 to 2022, the number of unique projects removed from PyPi increased rapidly: 38 in the first year, followed by 130, 60, 500, 27 thousands, and finally 12 thousands in the last year. Despite the fact that most of these were reported as malware, it's worth noting that the impact of some of them is limited due to the lack of organic usage.

Malware analysis

These attacks can be mitigated by implementing techniques such as multi-factor authentication, software signing, update frameworks, or reproducible builds, but the most widespread method is malware analysis.

Some engines check for anomalies via static and dynamic heuristics, while others rely on signatures due to their simplicity. Once a piece of software is detected as malicious, its hash is added to a deny list that is embedded in the anti-malware engine. Each file is then hashed and the result is checked against the deny list. If the heuristics or the hash comparison identifies the file as malicious, it is either reported, blocked, or deleted depending on the strategy implemented by the anti-malware engine.

Malware analysis in PyPi

These solutions are already implemented in software repositories. In the case of PyPi, malware scanning was introduced in February 2022 with the assistance of a malware check feature in Warehouse, the application serving PyPi. However, it was disabled by the administrators two years later and ultimately removed in May 2023 due to an overload of alerts.

In addition to this technical solution, PyPi also capitalises on a form of social symbiosis. Software security companies and individuals conduct security research, reporting any discovered malware to the PyPi administrators via email. The administrators typically allocate 20 minutes per week to review these malware reports and remove any packages that can be verified as true positives. Ultimately, the reporting companies and individuals gain reputation or attention for their brands, products, and services.

Requirements

In addition to information about software repositories, supply chain attacks, malware analysis, and PyPi, the researchers also interviewed administrators from PyPi to understand their requirements for a malware analysis tool that could assist them. The three interviews, each lasting one hour, were conducted in July and August 2022 and involved only three individuals. This limited number of interviews is due to the focus on the PyPi ecosystem, where only ten people are directly involved in malware scanning activities.

When discussing requirements, the administrators desired tools with a binary outcome, which could be determined by checking if a numerical score exceeds a threshold or not. The decision should also be supported by arguments. While administrators can tolerate false negatives, they aim to reduce the rate of false positives to zero. The tool should also operate on limited resources and be easy to adopt, use and maintain.

Current tooling

But do the current solutions tick these boxes?

The researchers selected tools based on a set of criteria: analysing the code of the packages, having public detection techniques, and detection rules. Upon examining the available solutions, they found that only three could be used for evaluation in the context of their research: PyPi's malware checks, Bandit4Mal, and OSSGadget's OSS Detect Backdoor.

Regarding the former, it should be noted that the researchers did not match the YARA rules only against the setup files, but also against all files in the Python package. The second, Bandit4Mal, is an open-source version of Bandit that has been adapted to include multiple rules for detecting malicious patterns in the AST generated from a program's codebase. The last, OSSGadget's OSS Detect Backdoor, is a tool developed by Microsoft in June 2020 to perform rule-based malware detection on each file in a package.

These tools were tested against both malicious and benign Python packages. The researchers used two datasets containing 168 manually-selected malicious packages. For the benign packages, they selected 1,400 popular packages and one thousand randomly-selected benign Python packages.

For the evaluation process, they considered an alert in a malicious package to be a true positive and an alert in a benign package to be a false positive.

The true positive rate was 85% for the PyPi checks, the same for OSS Detect Backdoor and 90% for Bandit4Mal. The false positive rates ranged from 15% for the PyPi checks over the random packages, to 80% for Bandit4Mal on popular packages.

The tools ran in a time-effective manner, with a median time of around two seconds per package across all datasets. The maximum runtime was recorded for Ansible’s package, which was scanned in 26 minutes.

Despite their efficient run times, we can infer from these results that the tools are not accurate enough to meet the demands of PyPi’s administrators. The analysts may be overwhelmed by alerts for benign packages, which could interfere with their other operations.

Conclusions

And with this, we can conclude the episode of the Ubuntu Security Podcast, which details the paper “Bad Snakes: Understanding and Improving Python Package Index Malware Scanning”. We have discussed software repositories, malware analysis, and malware-related operations within PyPi. We've also explored the requirements that would make a new open-source Python malware scanner suitable for the PyPi administrators and evaluated how the current solutions perform.

If you come across any interesting topics that you believe should be discussed, please email us at [email protected].

Over to you, Alex!