MySQL Regular Expressions
A regular expression is a special string that describes a search pattern. It’s a powerful tool to give a concise and flexible way for identifying text strings such as characters and words based on patterns. It uses its own syntax that can be interpreted by a regular expression processor. A regular expression is widely used in almost all platforms, from programming languages to databases, including MySQL.
A regular expression uses the backslash as an escape character that should be considered in the pattern match if double backslashes have used. The regular expressions are not case sensitive. It is abbreviated as REGEX or REGEXP in MySQL.
The advantage of using regular expression is that we are not limited to search for a string based on a fixed pattern with the percent (%) sign and underscore (_) in the LIKE operator. The regular expression has more meta-characters that allow more flexibility and control while performing pattern matching.
We have previously learned about wildcards, which allows us to get a similar result as regular expressions. So we may ask why we learn regular expressions if we will get the same result as the wildcards. It is because regular expressions allow us to search data matching even more complex ways compared to wildcards.
Syntax
MySQL adapts the regular expression implemented by Henry Spencer. MySQL allows us to match patterns right in the SQL statements by using the REGEXP operator. The following is the basic syntax that illustrates the use of regular expressions in MySQL:
In this syntax, the column_list indicates the column name returns in the result set. The table_name is the name of the table that data will be retrieved using the pattern. The WHERE field_name represents the column name on which the regular expression is performed. The REGEXP is the regular expression operator, and the pattern is the search condition to be matched by REGEXP. We can also use the RLIKE operator, which is the synonym for REGEXP that gives the same results as REGEXP. We can avoid the confusion to use this statement with the LIKE operator by using the REGEXP instead of LIKE.
This statement returns true if a value in the WHERE field_name matches the pattern. Otherwise, it returns false. If either field_name or pattern is NULL, the result is always NULL. The negation form of the REGEXP operator is NOT REGEXP.
Regular Expression Meta-Characters
The following table shows the most commonly used meta-characters and constructs in a regular expression:
Meta-Character | Descriptions |
---|---|
^ | The caret (^) character is used to start matches at the beginning of a searched string. |
$ | The dollar ($) character is used to start matches at the end of a searched string. |
. | The dot (.) character matches any single character except for a new line. |
[abc] | It is used to match any characters enclosed in the square brackets. |
[^abc] | It is used to match any characters not specified in the square brackets. |
* | The asterisk (*) character matches zero (0) or more instances of the preceding strings. |
+ | The plus (+) character matches one or more instances of preceding strings. |
{n} | It is used to match n instances of the preceding element. |
{m, n} | It is used to match m to n instance of the preceding element. |
p1|p2 | It is used to isolate alternatives that match any of the patterns p1 or p2. |
? | The question mark (?) character matches zero (0) or one instance of preceding strings. |
[A-Z] | It is used to match any upper case character. |
[a-z] | It is used to match any lower case character. |
[0-9] | It is used to match numeric digits from 0 to 9. |
[[:<:]] | It matches the beginning of words. |
[[:>:]] | It matches the end of words. |
[:class:] | It is used to match a character class, i.e. [:alpha:] matches letters, [:space:] matches white space, [:punct:] matches punctuations and [:upper:] for upper-class letters. |
Let us understand the regular expressions using practical examples given below:
Suppose we have a table named student_info that contains the following data. We will demonstrate various examples based on this table data.
If we want to search for students whose name start with “A or B”, we can use a regular expression together with the meta-characters as follows:
Executing the statement, we will get the desired result. See the below output:
If we want to get the student information whose name ends with k, we can use ‘k$’ meta-character to match the end of a string as follows:
Executing the statement, we will get the desired result. See the below output:
If we want to get the student information whose name contains exactly six characters, we can do this using ‘^’ and ‘$ meta-characters. These characters match the beginning and end of the student name and repeat {6} times of any character ‘.‘ in-between as shown in the following statement:
Executing the statement, we will get the desired result. See the below output:
If we want to get the student info whose subjects contains ‘i’ characters, we can do this by using the below query:
Executing the statement, we will get the desired result. See the below output:
Regular Expression Functions and Operators
The following are the list of regular functions and operators in MySQL:
Name | Descriptions |
---|---|
NOT_REGEXP | It is the negation of a REGEXP operator. |
REGEXP | This operator represents whether the string matches regular expression or not. |
RLIKE | This operator represents whether the string matches regular expression or not. |
REGEXP_INSTR() | It is a function that gives a result when the starting index of substring matches a regular expression. |
REGEXP_LIKE() | This function represents whether the string matches regular expression or not. |
REGEXP_REPLACE() | It gives results by replacing substrings that match the regular expression. |
REGEXP_SUBSTRING() | This function return substring that matches a regular expression. |
Let’s see all of them in detail.
REGEXP, RLIKE, & REGEXP_LIKE()
Although these functions and operators return the same result, REGEXP_LIKE() gives us more functionality with the optional parameters. We can use them as follows:
These statements give output whether string expression matches regular expression pattern or not. We will get 1 if an expression matches the pattern. Otherwise, they return 0. The below examples explain it more clearly.
In the below image, the first statement returns ‘1’ because ‘B’ is in the range A-Z. The second statement limited the range of the pattern to B-Z. So ‘A’ will not match any character within the range, and MySQL returns 0. Here we have used the alias match_ and not_match_ so that the returned column will be more understandable.
REGEXP_LIKE() Parameter
The following are the five possible parameters to modify the function output:
- c: It represents a case-sensitive matching.
- i: It represents a case-insensitive matching.
- m: It represents a multiple-line mode that allows line terminators within the string. By default, this function matches line terminators at the start and end of the string.
- n: It is used to modify the . (dot) character to match line terminators.
- u: It represents Unix-only line endings.
Example
In this example, we have added the ‘c’ and ‘i’ as an optional parameter, which invokes case-sensitive and case-insensitive matching. The first query gives the output 0 because ‘a’ is in the range ‘a-z’, but not in the range of capital letters A-Z. The second query gives the output 1 because of case-insensitive features.
NOT REGEXP & NOT RLIKE
They are regular expression operators that compare the specified pattern and return the result, which does not match the patterns. These operators return 1 if no match is found. Otherwise, they return 0. We can use these functions as follows:
Example
The below statement returns 0 because ‘a’ is found in the given range.
Here is the output:
REGEXP_INSTR()
It is a function that gives a result when the starting index of substring expression matches the pattern. It returns 0 if there is no match found. If either expression or pattern is NULL, it returns NULL. Here indexing starts at 1.
This function uses various optional parameters that are pos, occurrence, return_option, match_type, etc.
Example
Suppose we want to get the index position of substring ‘a’ within expr (a b c d e f a). The first query returns 1 because we have not set any optional parameters, which is the string’s first index. The second query returns 13 because we have modified the query with optional parameter occurrence.
REGEXP_REPLACE()
This function replaces the specified string character by matching characters and then returns the resulting string. If any expression, pattern, or replaceable string is not found, it will return NULL. This function can be used as follows:
The replace character uses the optional parameters such as pos, occurrence, and match_type.
Example
This statement replaces the ‘tutorials’ pattern with the ‘javat’ pattern.
Here is the output:
REGEXP_SUBSTRING()
This function returns the substring of an expression that matches the specified pattern. If the expression or specified pattern or even no match is found, it returns NULL. This function can be used as follows:
The pattern uses the optional parameters such as pos, occurrence, and match_type.
Example
This statement returns the ‘point’ pattern, which is the third occurrence of the given range.
Here is the output: