Regular
Expressions
Regular expressions are Patterns
that can be used to match strings. We can call it a formula for matching
strings that follow some pattern. Regular expression(s) can be considered as a
Language, which is designed to manipulate text.
Regular Expressions may be used to find one or more occurrences of a pattern of characters within a string. You may choose to replace it with some other characters or perform some other tasks based on the results obtained. These patterns of characters can be simple or very complex. Regular Expressions generally comprises of two types of characters -
Regular Expressions may be used to find one or more occurrences of a pattern of characters within a string. You may choose to replace it with some other characters or perform some other tasks based on the results obtained. These patterns of characters can be simple or very complex. Regular Expressions generally comprises of two types of characters -
- Literal or Normal Characters such as
"abcd123"
- Special Characters that have a special meaning such as
"." Or "$" or "^"
Due to the special characters
Regular Expressions form a very powerful means of manipulating strings and
text.
Meta-characters and their Description
.
|
Matches any single character. An example of this is the regular
expression s.t would match the strings sat, sit, but not sight.
|
$
|
Matches the end of a line. For instance, the regular expression reason$
would match the end of the string "He has a reason" but not the
string "He has his reasons"
|
^
|
Matches the beginning of a line. For instance, the regular expression
^Where would match the beginning of the string "Where is my cap"
but would not match "Do you know Where it is " .
|
*
|
Matches zero or more occurrences of the character immediately preceding.
For example, the regular expression .* means match any number of any
characters.
|
[ ]
[c1-c2]
[^c1-c2]
|
·
Matches any one of the characters between the
brackets.
For example, the regular expression s[ia]t matches sat, sit, but not set.
·
Ranges of characters can specified by using a
hyphen.
For example, the regular expression [0-9] means match any digit. Multiple ranges can be specified as well. The regular expression [A-Za-z] means match any upper or lower case letter.
·
To match any character except those in the
range, the complement range, use the caret as the first character after the
opening bracket.
For example, the expression [^123a-z] will match any characters except 1,2, 3, and lower case letters. |
|
|
Or two conditions together. For example (him|her) matches the line
"it belongs to him" and matches the line "it belongs to
her" but does not match the line "it belongs to them."
|
+
|
Matches one or more occurrences of the character or regular expression
immediately preceding. For example, the regular expression 9+ matches 9, 99,
999.
|
?
|
Matches 0 or 1 occurrence of the character or regular expression
immediately preceding.
|
{i}
{i,j}
|
·
Match a specific number of instances or
instances within a range of the preceding character.
For example, the expression A[0-9]{3} will match "A" followed by exactly 3 digits. That is, it will match A123 but not A1234.
·
The expression [0-9]{4,6} any sequence of 4,
5, or 6 digits
|
\d Matches a digit character.
Equivalent to [0-9].
\D Matches a non-digit character.
Equivalent to [^0-9].
\w Matches any word character
including underscore. Equivalent to "[A-Za-z0-9_]".
\W Matches any
non-word character. Equivalent to "[^A-Za-z0-9_]".
\b Matches
a word boundary, that is, the position between a word and a space. For example,
"er\b" matches the "er" in "never" but not the
"er" in "verb".
\B Matches
a non-word boundary. "ea*r\B" matches the "ear" in
"never early".
No comments:
Post a Comment