David Eisinger


Elsewhere > Regular Expressions in MySQL

Posted 2011-09-28 on viget.com

Did you know MySQL supports using regular expressions in SELECT statements? I’m surprised at the number of developers who don’t, despite using SQL and regexes on a daily basis. That’s not to say that putting a regex into your SQL should be a daily occurrence. In fact, it can cause more problems than it solves, but it’s a handy tool to have in your belt under certain circumstances.

Basic Usage

Regular expressions in MySQL are invoked with the REGEXP keyword, aliased to RLIKE. The most basic usage is a hardcoded regular expression in the right hand side of a conditional clause, e.g.:

SELECT * FROM users WHERE email RLIKE '^[a-c].*[0-9]@'; 

This SQL would grab every user whose email address begins with ‘a’, ‘b’, or ‘c’ and has a number as the final character of its local portion.

Something More Advanced

The regex used with RLIKE does not need to be hardcoded into the SQL statement, and can in fact be a column in the table being queried. In a recent project, we were tasked with creating an interface for managing redirect rules à la mod_rewrite. We were able to do the entire match in the database, using SQL like this (albeit with a few more joins, groups and orders):

SELECT * FROM redirect_rules WHERE '/news' RLIKE pattern; 

In this case, ‘/news’ is the incoming request path and pattern is the column that stores the regular expression. In our benchmarks, we found this approach to be much faster than doing the regular expression matching in Ruby, mostly because of the lack of ActiveRecord overhead.

Caveats

Using regular expressions in your SQL has the potential to be slow. These queries can’t use indexes, so a full table scan is required. If you can get away with using LIKE, which has some regex-like functionality, you should. As always: benchmark, benchmark, benchmark.

Additionally, MySQL supports POSIX regular expressions, not PCRE like Ruby. There are things (like negative lookaheads) that you simply can’t do, though you probably ought not to be doing them in your SQL anyway.

In PostgreSQL

Support for regular expressions in PostgreSQL is similar to that of MySQL, though the syntax is different (e.g. email ~ '^a' instead of email RLIKE '^a'). What’s more, Postgres contains some useful functions for working with regular expressions, like substring and regexp_replace. See the documentation for more information.

Conclusion

In certain circumstances, regular expressions in SQL are a handy technique that can lead to faster, cleaner code. Don’t use RLIKE when LIKE will suffice and be sure to benchmark your queries with datasets similar to the ones you’ll be facing in production.