Regular expressions are often used to check if a user input should be allowed for a specific action or lead to an error as it might be malicious.
Let’s say we have the following regular expression that should guard the application from allowing any characters that could be used to execute code as part of a template injection:
At first sight this looks OK: if we have a string containing only numbers and/or characters
a-z we will match them and can continue. If we have other characters and are thus not matching this pattern, we can error out. Injecting something like
abc<%=7*7%> or any other template injection pattern won’t work. Or will it? It depends…
Let’s compare the behavior in two environments: Ruby and Python.
my_input = "abc<%= 7*7 %>" if my_input.match(/^[0-9a-z]+$/) puts "Matches pattern, let's continue..." else puts "Does not match. Error out..." end
import re my_input = 'abc<%= 7*7 %>' if re.match(r'^[0-9a-z]+$', my_input): print('Matches pattern, let\'s continue...') else: print('Does not match. Error out...')
Running either sample will lead to the error case.
However, once we introduce a multi-line string, the two examples behave differently. Let’s change the
abc\n<%= 7*7 %> in both snippets and run them again:
$ ruby test.rb Matches pattern, let's continue...
$ python3 test.py Does not match. Error out...
If an attacker is able to control
my_input in the above Ruby example, and this input is then actually used somewhere important (like in a template), it can lead to remote code execution and/or information disclosure.
Why is it different?
In Ruby (but not only) the
$ match at the start and end of each line. So if any (!) one line is matching, we have a successful match. What we would rather want in this case is matching the beginning and end of the string, which is possible with
In Python, on the other hand, we would need to enable this multi-line behavior explicitly with
re.MULTILINE. Taking the example from above, this (probably unwanted behavior) would look like:
if re.match(r'^[0-9a-z]+$', my_input, re.MULTILINE): print('Matches...')
Note, though, that Python’s
re.match would generally only always match the beginning of the string, not the beginning of each line (see
re.search for scanning the full string).
Be mindful of the implementation
When testing the security of a specific restriction, be mindful of which environment you are dealing with. Sometimes, it only takes a linefeed (
\n) to bypass a check and cause serious trouble.
Like to comment? Feel free to send me an email or reach out on Twitter.
Did this or another article help you? If you like and can afford it, you can buy me a coffee (3 EUR) ☕️ to support me in writing more posts. In case you would like to contribute more or I helped you directly via email or coding/troubleshooting session, you can opt to give a higher amount through the following links or adjust the quantity: 50 EUR, 100 EUR, 500 EUR. You'll be able to get a receipt for your business.