Python tarfile directory traversal

12 minute read

Currently, there’s a lot of hype around the behavior of Python’s tarfile module for extracting archives. In short: tarfile will not sanitize filenames in archives to prevent directory traversal attacks. For example, creating an archive and adding a file with a leading ../ will make the extract* methods create that file in a directory above the current one. This way (or by using an absolute path starting with /), a file can be written to an arbitrary location (given that the user executing the code has the according write privileges).

In 2007 this behavior was filed under CVE-2007-4559. It didn’t lead to a patch of the library, but instead the documentation was updated to include:

Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..".

After the article Exploiting the World With a 15-Year-Old Vulnerability and Trellix Launches Advanced Research Center, Finds Estimated 350K Open-Source Projects at Risk to Supply Chain Vulnerability by Trellix this behavior is now on everybody’s radar again.

How does it work?

The behavior is fairly easy to demonstrate.

First, create a demo file (test.txt). Then, create an archive with that file but specify an adjusted name to be used in the archive (../test.txt):

import tarfile

with open('test.txt', 'w') as f:
    f.write('Hello')

with tarfile.open('my_archive.tar', 'w:xz') as tar:
    tar.add('test.txt', arcname='../test.txt')

Next, give that file to a Python application that extracts tar files, for example:

import tarfile

with tarfile.open("my_archive.tar") as tar:
    tar.extractall()

Running the application will extract test.txt into the directory above the current directory. By adjusting the path in arcname you can essentially place the file anywhere you want, given you are allowed to write there. Overwriting critical system files, writing executables, etc. is, of course, all potentially possible.

Is it a vulnerability and is every extract use immediately vulnerable?

The library basically does what the documentation says it does and works according to specifications. While secure defaults are generally preferred, I’m not sure I would classify this as a vulnerability in the library per se, but more so in the projects using the library’s extract* methods without additional member checks.

Still, having an explicit option in Python’s tarfile to allow absolute paths (or those containing ..) like BSD/GNU tar’s -P option would certainly be desirable as it would allow developers to explicitly (or better implicitly) enable these protections. After all, not everybody reads the docs :-)

The above mentioned tar implementations either reject or change the invalid paths and you get a warning like tar: Removing leading ../../../../../../../../../' from member names.

While secure defaults are generally better and it’s a good idea to review your existing code for tarfile usage and check if any externally controllable files can reach the extract methods, I’m always a bit skepitcal about sentences like “a vulnerability estimated to be present in over 350,000 open-source projects and prevalent in closed-source projects”. Resulting news headlines then quickly come up with something like Unpatched 15-year old Python bug allows code execution in 350k projects.

There are surely a number of affected projects where the described behavior can cause a lot of trouble. But, generally, not every use of the extract methods will be with files that can be controlled by an attacker and thus not every call with an unsanitized archive can be exploited. And of those that can be exploited, not every case will lead to immediate remote code execution.

Funny side note: I have seen this issue multiple times deliberately implemented in CTF challenges in the past.

Like to comment? Feel free to send me an email or reach out on Twitter.

Did this or another article help you? If you like and can afford it, you can buy me a coffee (3 EUR) ☕️ to support me in writing more posts. In case you would like to contribute more or I helped you directly via email or coding/troubleshooting session, you can opt to give a higher amount through the following links or adjust the quantity: 50 EUR, 100 EUR, 500 EUR. All links redirect to Stripe.