Supply chain attacks in software have been the source of widespread cyberattacks on individuals and organizations worldwide. The SolarWinds Orion and Microsoft Exchange hacks are two prime examples that we’ll continue to feel the effects of for years to come. An attack on a software supply chain typically involves the attacker inserting malicious code into official production software. When users install or update the software, the malicious code is included, giving hackers a foothold to plant malware, steal data, hijack systems, or any number of other attacks. So how are supply chain attacks actually carried out? It’s possible for a rogue software developer to sabotage their own work. Or for an attacker to break into the developer’s network or device. A nation-state threat actor could intentionally place backdoors into software to be used in attacks against targets in other countries. But these attacks all require the attacker to first gain access to the developer’s network or machine. Comparitech researchers have been studying another type of supply chain attack that’s much easier to pull off: the unclaimed package name dependency confusion attack. Primer: repositories, packages, and dependencies Most software developers don’t write all of their own code from scratch. That would require far too much time and effort. Instead, most developers build on the work of others by assembling pre-written, open-source code from public repositories. This code comes in the form of packages, which are usually downloaded from repository hosting sites like Github. A single project can depend on code or resources from any number of third-party packages, which are maintained by other developers. Both the third-party packages and the developer’s own code often require other dependencies as well. Keeping third-party dependencies up to date usually requires a package manager, which allows developers to download and install those dependencies with a single command, such as: gem install drip … or … pip3 install requests So when you enter one of the commands above, the program will go into the repository, search for the packages marked “drip” or “requests”, and install them. In addition to pulling code from online repositories, packages can be stored and accessed in local, private repositories. Packages stored in the private repository as a “drip” or “request” will be installed accordingly. When a developer stores internal packages in a private repository but allows dependencies to be retrieved from a public repo, this is called a “hybrid” configuration. Upon updating, app users check both the private and public feeds for the best available versions of the required packages. And this is where supply chain substitution attacks come in… Supply chain substitution attacks leverage package managers Let’s say you’re developing software that pulls dependencies from a private repository. When you update your dependencies through a package manager, you probably expect it to check the local repository first. But in some cases, as Comparitech researchers found, package managers first check the online public repository, and only if a matching package can’t be found will it check the local repo. This creates opportunities for attackers. Comparitech researchers found the following package managers exhibit this behavior: mvn (Maven) pip (Python) gem (Ruby) npm (javascript, Node.js, React) If an attacker publishes a package on Github with the same name as the one in your local repo, the package manager will download the attacker’s package and ignore the local one. This allows an attacker to covertly insert malicious code into your software, resulting in a remote code execution (RCE) attack. If the malicious code is included in the next production software update, the attack will affect all app users who install it. Finding targets How does an attacker know what package names to target? Most developers store and access repositories on Github, a free source-code hosting and version control service. Most of the code on Github is open-source, meaning anyone can access it. Comparitech researchers chose Github repositories of 10 popular applications to analyze for unclaimed dependencies. A total of 1,644 files were analyzed. Within those files, 141 unclaimed names were found, which means attackers could publish their own packages with the same names in order to trick package managers into downloading and installing them. A supply chain attacker will gather source code and parse files with dependencies for package names they could use to launch the attack explained above. Common files with dependencies include: requirements.txt (Python) package.json (React) composer.json (PHP) pom.xml (Maven, Java) Researchers noted that Github doesn’t impose a rate limit, meaning hackers could parse files from a much larger number of apps and their repos. Even if proper precautions are taken (see below) and the attack doesn’t necessarily threaten the apps whose source code is shared, a third-party who reuses the code could inadvertently put themselves at risk. Prevention and mitigation Microsoft, which owns Github, gives three key recommendations (PDF) for preventing supply chain attacks through unclaimed package names: Reference one private feed, not multiple. Python Use the index-url option in pip’s configuration file or command line to specify the feed, overriding the default. Avoid the extra-index-url option, which is additive and may lead to having multiple indexes. NuGetGallery Ensure your nuget.config package sources section starts with an entry to remove any inherited configuration. Use a single entry for your private feed. Maven Configure a single mirror that is * to direct all requests through a single repository. Alternatively, the default public repository can be overridden to disable the setting. Gradle There are no default repositories, making it easy to only specify your private feed. Protect your packages using controlled scopes. npm Using a scope prefix in combination with registry configuration allows you to specify the source for each package. Because only a single registry will be searched, this protects against substitution attacks through the public registry. These options can be configured for each project or an entire machine using an npmrc file. Similar options exist for Yarn through the .yarnrc.yml file. NuGetGallery An ID prefix can be registered by publishers to restrict uploads to the public gallery. Packages under a registered prefix can only be uploaded by approved accounts, which also protects against public substitution attacks. This reservation can be done whether you intend to publish your packages to NuGet.org or not. Using a registered ID prefix for private packages helps ensure that an attacker cannot claim any of your names. Utilize client-side verification npm Installing from a package.json file automatically updates the package-lock.json file with versions and file hashes of packages. When package-lock.json is included with your project, running the “npm ci” command will replicate the install using version pinning and integrity checking. NuGetGallery A packages.lock.json file can be enabled for your project, which will be automatically created on “nuget restore.” When the file exists and is included with your project, it will be used by “nuget restore–locked-mode” to validate that the packages have not changed using version pinning and integrity checking. Python Pip’s hash-checking mode ensures that the downloaded file matches a known SHA256 hash stored in your project. Any attempted package substitution attack must compromise both server and client. Generating the hashes currently requires an additional tool such as pip-compile. Maven Dependencies are checked for modifications since the original upload but cannot be automatically verified against prior installs. Gradle Gradle dependency verification for packages downloaded from Maven Central can be enabled by following this documentation.