UnicodeError when running Python script via macOS LaunchAgent

5 minute read

If you are getting UnicodeErrors when reading/manipulating files using a Python script launched by a LaunchAgent or crontab, the problem might lie in the “current locale encoding”.

Sample script

Let’s assume you have the following code in a script set up to be launched by a LaunchAgent (also see my article on LaunchAgents):

with open(some_path_to_file) as f:
  f.read()

Let’s also assume that some_path_to_file points to a txt file containing some emojis (hey, why not? 😎) or some other unicode characters.

Running a file with the above code snippet in your terminal session will probably not cause any issues, because – most likely – everything is set up to use utf-8 as the default encoding: python3 my_snippet.py. Cool!

However, when the script is launched by a LaunchAgent, you may get an error.

UnicodeError: can’t decode byte

The error could look something like this:

Traceback (most recent call last):
  [...]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 13: ordinal not in range(128)

Why is this happening?

All about the context, or: what’s my locale?

Running the script in your terminal session and running it via a LaunchAgent might mean you/Python are reading different locale/encoding settings, as can be seen by the return value of getpreferredencoding from the locale module.

>>> locale.getpreferredencoding(False)
`UTF-8`
or
`US-ASCII`

From the Python help:

Return the encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.

Now, let’s look at what happens when we don’t explicitly specify the encoding argument for the open function:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

If the current locale cannot be determined, it will fall back to C locale US-ASCII. local.getlocale() in this case will likely also just return (None, None).

The solution: always specify the encoding!

If you just want to make sure that you can read your Unicode file properly, be explicit about the encoding, i.e.:

with open(some_path_to_file, encoding='utf-8') as f:
  f.read()

… and the problem should go away.

Leave a Comment