Bypassing regular expression checks with a line feed

8 minute read

Regular expressions are often used to check if a user input should be allowed for a specific action or lead to an error as it might be malicious.

Let’s say we have the following regular expression that should guard the application from allowing any characters that could be used to execute code as part of a template injection:

/^[0-9a-z]+$/

At first sight this looks OK: if we have a string containing only numbers and/or characters a-z we will match them and can continue. If we have other characters and are thus not matching this pattern, we can error out. Injecting something like abc<%=7*7%> or any other template injection pattern won’t work. Or will it? It depends…

Implementations matter

Let’s compare the behavior in two environments: Ruby and Python.

Ruby:

my_input = "abc<%= 7*7 %>"

if my_input.match(/^[0-9a-z]+$/)
  puts "Matches pattern, let's continue..."
else
  puts "Does not match. Error out..."
end

Python:

import re

my_input = 'abc<%= 7*7 %>'

if re.match(r'^[0-9a-z]+$', my_input):
    print('Matches pattern, let\'s continue...')
else:
    print('Does not match. Error out...')

Running either sample will lead to the error case.

However, once we introduce a multi-line string, the two examples behave differently. Let’s change the my_input to abc\n<%= 7*7 %> in both snippets and run them again:

$ ruby test.rb
Matches pattern, let's continue...
$ python3 test.py
Does not match. Error out...

If an attacker is able to control my_input in the above Ruby example, and this input is then actually used somewhere important (like in a template), it can lead to remote code execution and/or information disclosure.

Why is it different?

In Ruby (but not only) the ^ and $ match at the start and end of each line. So if any (!) one line is matching, we have a successful match. What we would rather want in this case is matching the beginning and end of the string, which is possible with \A and \z.

In Python, on the other hand, we would need to enable this multi-line behavior explicitly with re.MULTILINE. Taking the example from above, this (probably unwanted behavior) would look like:

if re.match(r'^[0-9a-z]+$', my_input, re.MULTILINE):
    print('Matches...')

Note, though, that Python’s re.match would generally only always match the beginning of the string, not the beginning of each line (see re.search for scanning the full string).

Be mindful of the implementation

When testing the security of a specific restriction, be mindful of which environment you are dealing with. Sometimes, it only takes a linefeed (\n) to bypass a check and cause serious trouble.

Like to comment? Feel free to send me an email or reach out on Twitter.

Did this or another article help you? If you like and can afford it, you can buy me a coffee (3 EUR) ☕️ to support me in writing more posts. In case you would like to contribute more or I helped you directly via email or coding/troubleshooting session, you can opt to give a higher amount through the following links or adjust the quantity: 50 EUR, 100 EUR, 500 EUR. You'll be able to get a receipt for your business.