If you are getting UnicodeErrors when reading/manipulating files using a Python script launched by a LaunchAgent or crontab, the problem might lie in the “current locale encoding”.
Let’s assume you have the following code in a script set up to be launched by a LaunchAgent (also see my article on LaunchAgents):
Let’s also assume that
some_path_to_file points to a
txt file containing some emojis (hey, why not? 😎) or some other unicode characters.
Running a file with the above code snippet in your terminal session will probably not cause any issues, because – most likely – everything is set up to use utf-8 as the default encoding:
python3 my_snippet.py. Cool!
However, when the script is launched by a LaunchAgent, you may get an error.
UnicodeError: can’t decode byte
The error could look something like this:
Traceback (most recent call last): [...] UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 13: ordinal not in range(128)
Why is this happening?
All about the context, or: what’s my locale?
Running the script in your terminal session and running it via a LaunchAgent might mean you/Python are reading different locale/encoding settings, as can be seen by the return value of
getpreferredencoding from the
>>> locale.getpreferredencoding(False) `UTF-8` or `US-ASCII`
From the Python help:
Return the encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.
Now, let’s look at what happens when we don’t explicitly specify the
encoding argument for the
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
If the current locale cannot be determined, it will fall back to C locale
local.getlocale() in this case will likely also just return
The solution: always specify the encoding!
If you just want to make sure that you can read your Unicode file properly, be explicit about the encoding, i.e.:
… and the problem should go away.