Elsewhere > Regular Expressions in MySQL
Posted 2011-09-28 on viget.com
Did you know MySQL supports using regular
expressions in
SELECT
statements? I’m surprised at the number of developers who
don’t, despite using SQL and regexes on a daily basis. That’s not to say
that putting a regex into your SQL should be a daily occurrence. In
fact, it can cause more problems than it
solves, but
it’s a handy tool to have in your belt under certain circumstances.
Basic Usage
Regular expressions in MySQL are invoked with the
REGEXP
keyword,
aliased to RLIKE
. The most basic usage is a hardcoded regular
expression in the right hand side of a conditional clause, e.g.:
SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@';
This SQL would grab every user whose email address begins with ‘a’, ‘b’, or ‘c’ and has a number as the final character of its local portion.
Something More Advanced
The regex used with RLIKE does not need to be hardcoded into the SQL statement, and can in fact be a column in the table being queried. In a recent project, we were tasked with creating an interface for managing redirect rules à la mod_rewrite. We were able to do the entire match in the database, using SQL like this (albeit with a few more joins, groups and orders):
SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern;
In this case, ‘/news’ is the incoming request path and pattern
is the
column that stores the regular expression. In our benchmarks, we found
this approach to be much faster than doing the regular expression
matching in Ruby, mostly because of the lack of ActiveRecord overhead.
Caveats
Using regular expressions in your SQL has the potential to be slow.
These queries can’t use indexes, so a full table scan is required. If
you can get away with using LIKE
, which has some regex-like
functionality, you should. As always: benchmark, benchmark, benchmark.
Additionally, MySQL supports POSIX regular expressions, not PCRE like Ruby. There are things (like negative lookaheads) that you simply can’t do, though you probably ought not to be doing them in your SQL anyway.
In PostgreSQL
Support for regular expressions in PostgreSQL is similar to that of
MySQL, though the syntax is different (e.g. email ~ '^a'
instead of
email RLIKE '^a'
). What’s more, Postgres contains some useful
functions for working with regular expressions, like substring
and
regexp_replace
. See the
documentation
for more information.
Conclusion
In certain circumstances, regular expressions in SQL are a handy
technique that can lead to faster, cleaner code. Don’t use RLIKE
when
LIKE
will suffice and be sure to benchmark your queries with datasets
similar to the ones you’ll be facing in production.