Regular expressions are a very powerful tool to validate, and find/replace, substrings inside text. They enable you to define very complex patterns, and their processing can be much faster than working with the String class's Replace, Substring, IndexOf, and the other basic methods.
The following tables summarize the most frequently used syntax constructs for the regular expressions. In the first table, you can see how to express the characters that we want to match.
Character escapes |
Description |
|
|
|
|
Ordinary characters
|
Characters other than .$^{[(|)*+?\ match themselves
|
|
\b
|
Matches a backspace
|
|
\t
|
Matches a tab
|
|
\r
|
Matches a carriage return
|
|
\v
|
Matches a vertical tab
|
|
\f
|
Matches a form feed
|
|
\n
|
Matches a newline
|
|
\
|
If followed by a nonordinary character (one of those listed in the first row), matches that character — for example, \+ matches a + character
|
In addition to single characters, you can specify a class or a range of characters that can be matched in the expression. That is, you could allow any digit or any vowel in a position, and exclude all the other characters. The character classes in the following table enable you to do this.
Character class |
Description |
|
.
|
Matches any character except \n
|
|
[aeiou]
|
Matches any single character specified in the set
|
|
[^aeiou]
|
Matches any character not specified in the set
|
|
[3–7a–dA–D]
|
Matches any character specified in the specified ranges (in the example, the ranges are 3–7, a–d, A–D)
|
|
\w
|
Matches any word character — that is, any alphanumeric character or the underscore (_)
|
|
\W
|
Matches any nonword character
|
|
\s
|
Matches any whitespace character (space, tab, form-feed, new line, carriage return, or vertical feed)
|
|
\S
|
Matches any nonwhitespace character
|
|
\d
|
Matches any decimal character
|
|
\D
|
Matches any nondecimal character
|
You can also specify that a certain character or class of characters must be present at least once, or between two and six times, and so on. The quantifiers are put just after a character or a class of characters and enable you to specify how many times the preceding character/class must be matched, as the following table shows.
Quantifier |
Description |
|
*
|
Zero or more matches
|
|
+
|
One or more matches
|
|
?
|
Zero or one matches
|
|
{N}
|
N matches
|
|
{N,}
|
N or more matches
|
|
{N,M}
|
Between N and M matches
|
As an example, suppose that you have the expression [aeiou]{2,4}\+[1–5]*: This means that a string to correctly match this expression must start with two to four vowels, have a + sign, and terminate with zero or more digits between 1 and 5.