What is a regular expression? A regular expression is a set of pre-defined characters, and a combination of these characters, that forms a string string string that is used to express a filter logic for the string.
Regular expressions can be used to achieve the following:
给定的字符串是否符合正则表达式的过滤逻辑(称作“匹配”);
可以通过正则表达式,从字符串中获取我们想要的特定部分。
To make it easier for you to learn, I recommend Regextor, a software that verifies regular expressions.
Regular expressions can contain only plain text, representing a precise matching of that text; for example:
The formal expression is:song
This is a list of characters who have appeared in the anime series Xiao Songge.
Matching results: xiaosong
Ge, xiao songge
By default, the regular expression is capitalized, so the song will not match the
. is used to match an arbitrary character, such as:
The regular expression is: c.t.
Cat cet caaat dog is a game of cat-and-mouse.
Matching results:cat
cet
Caat dog
Analysis: c.t matches a string with a "c" beginning and ending with a "t" ending, with an arbitrary character in the middle.
同理,多个连续的.可以匹配多个连续的任意字符:
The regular expression: c...t
Cat cet caat dog
Matched results: cat cetcaat
dog
. contains special meaning in the regular expression, is a special character.. is also a special character, and can play a transitive role on special characters.. if you want to match is a real ".
The regular expression is: c.t.
Cat c.t. dog is the name of the game.
Matched results: catc.t
The dog
Note: Because \ is also a special character, two inverse slashes \ are required to match a real \ character:
The regular expression: c\t
Cat c \ t dog
Matched results: catc\t
dog
As mentioned above. I can match any character, but what if I want to match a few specific characters?
The regular expression is: c[ab]t
I'm waiting for the match text: cat cbt cet
Matching results:cat
cbt
That's it.
Analysis: [ab] matches "a
In the example above, what if I want to match cet and I add one more in [?], and if I want to match any lowercase letters, do I put dozens in?
The regular expression: c[a-z]t
Cat cbt czt c2t
Matching results:cat
cbt
czt
c2t
Analysis: c[a-z]t is represented by "c
There are also similar areas:
[0-9] is the same function as [0123456789]. Matches all numbers. [A-F] Matches A to F capital characters. [A-Z] Matches all capital letters from A to Z. [a-z] matches all lowercase characters from a to z. [A-z] matches all characters in ASCII from A to ASCII z ((not only matches all characters, but also matches characters in the ASCII table from A to z, such as [ and ^ etc.)). [A-Za-z0-9] Matches all uppercase letters and numbers.
Character sets are generally used to specify a set of characters that need to be matched. However, sometimes you want to exclude a set of characters that you don't want to match. This can be done by denying the character set.
The regular expression is: c[^a-z]t
Cat cbt czt c2t cAt
Matching results: cat cbt cztc2t
cAt
Analysis: This is the complete opposite of the previous example. │[a-z] matches all lowercase characters, while [^a-z] matches all non-lowercase characters.
Note that the ^ character is a de-match of all characters in the character set.
Metatarsals have special meanings in regular expressions, and we have already mentioned several metatarsals, such as.、[and]。 These characters cannot directly express their meaning, for example, they cannot directly use [to match " [
All meta-characters can be preceded by an inverse-square transformation, after which the transformation will match the character itself rather than its special meaning. For example, "[will match" [:]
Regular expression: a[b]
Text to match: a[b ab a[[b
Matching results:a[b
Ab a [[b]]
Note: \ is used to convert a metric character, which also means \ is also a metric character. So if you need to match a real \, you can use \:
The regular expressions are: a\b
Text to match: a\b a\b a[[b
Matching results: a\ba\b
a[[b
Sometimes you may need to match blank characters that cannot be printed in the text. For example, you want to be able to find all tab characters, or all swap characters. You can use the special sub-characters in the table below:
For example, \r\n will match a back-and-forth switching combination, which in Windows represents a file switching line. In Linux and Unix systems, only \n can be used.
There are some special metatarsals that can be used to match commonly used character sets. These metatarsals are called matching character classes. You will find them very convenient to use.
Matching numbers or non-digits As mentioned above,[0-9] can match all numbers. If you don't want to match any numbers, you can use [^0-9]。 The following table lists numeric and non-numeric class elements:
The regular expression is: c\dt
Cat c2t czt c9t
Matching results: catc2t
czt c9t
The regular expression is: c\Dt
Cat c2t czt c9t
Matching results:cat
c2t czt
c9t
Matching alphabetic and non-alphabetic characters
Another common class metadata is \w and \W:
The regular expression is: c\wt
Text to match: cat c2t czt c-t c\t
Matching results:cat
c2t
c-t c\t
The regular expression is: c\Wt
Text to match: cat c2t c-t c\t
Matching results: cat c2tc-t
c\t
Matching blank and non-blank
The last matching class you will encounter is the blank class:
The regular expression: c\st
Text to match: cat c t c2t c\t
Matching results: catc t
c2t c\t
The regular expression is: c\St
Text to match: cat c t c2t c\t
Matching results:cat
c t c2t
c\t
+ metadata indicates matching one or more characters. For example, a will match "a", and a+ will match one or more "a".
The regular expression is:cat
I'm waiting for matching text: ct cat caat caaat
Matched results: ctcat
caat caaat
The regular expression: ca+t
I'm waiting for the matching text:ct cat caat caaat
Matched results: ctcat
caat
caaat
When using + in a character set, the + symbol needs to be placed outside the set:
The regular expression is: c[0-9]+t
Text to match: ct c0t cat c123t
Matched results: ctc0t
cat c123t
Analysis: c[0-9]+t denotes a string with one or more digits in the middle, beginning with "c" and ending with "t".
Of course, [0-9+] is also a valid regular expression, but it represents a set of characters that contains the symbols "0" - "9" and "+".
Generally, meta-characters such as. and + are used as literal meanings when used in a character set, so there is no need to translate them. However, it is okay to translate them, so the functions of [0-9+] and [0-9+] are the same.
Note: + is a meta-character, and matching "+" requires the use of the transitive +。
If you want to match zero or more characters, you can use the * meta character.
The regular expression is: ca*t
We are waiting for matching text: ct cat caat cbt
Matching results:ct
cat
caat
cbt
Note:* The symbols are metadata characters. To match "*", a translation is required.
? matches a zero or a character. ▲ So,? is very suitable for matching an optional character in text. ▲
The regular expression is: ca?t
We are waiting for matching text: ct cat caat cbt
Matching results:ct
cat
Caat cbt
Note:? symbols are meta-characters.? to be able to match "? ", a transliteration is required.?
A regular expression allows the number of times a match can be specified. The number can be specified between the {\\displaystyle {\\displaystyle {\\displaystyle {\\displaystyle {\\displaystyle {\\vec {\\vec {}} } } Note: {and} is also a meta-character, which needs to be transliterated when using a literal meaning.
Exact number of matches To specify the number of matches, you can enter a number between {and}. For example, {3} will match the characters or sets that appear 3 times before:
The regular expression is: ca{3}t
We are waiting for matching text: ct cat caaat cbt
Matching results: ct catcaaat
cbt
At least the number of matches
We can also specify only the minimum value for matching. For example, {2,} means matching 2 or more times:
The regular expression is: ca{1,}t
We are waiting for matching text: ct cat caaat cbt
Matched results: ctcat
caaat
cbt
Matching between fractions
We can also use the minimum and maximum values to determine the number of matches. For example, {2,3} means a minimum of 2 matches and a maximum of 3 matches.
The normal expression is ca{2,3}t
We are waiting for matching text: ct cat caaat cbt
Matching results: ct catcaaat
cbt
So? and {0,1} have the same function + and {1,} have the same function.
Let's look at the following examples:
The regular expression is: s.*g I'm waiting for the match text: xiao song xiao song Matching results: xiao song xiao song Analysis: s.*g does not match the two "songs" as expected, but matches all the text between the first "s" and the last "g".
This is because * and + are greedy matches. That is, the regular expression always looks for the largest match, not the smallest, which is deliberately designed.
But if you don't want greedy matching, use these measure words for non-greedy matching (matching as few characters as possible):
Yes, you are.This is a non-greedy version of the app, so you can use *? to modify the example above:
正则表达式:s.What?
I'm waiting for the match text: xiao song xiao song
Matched results: xiaosong
xiao song
分析:可以看到s.?g matches two "songs"?.
The meta-character matching the string boundary is ^ and $, which are used at the beginning and end of the string, respectively.
^ Use the following terms:
The formal expression is:^xiao
This is the first time I've seen this video.
Matching results:xiao
song
Axiosong is a song written and performed by the Japanese singer-songwriter Axiosong. The results of the matches: axiaosong Analysis: ^xiao matches a string that begins with "xiao".
$ is used as follows:
The official expression is song$.
This is the first time I've seen this video.
Matched results: xiaosong
This is the first time I've seen this video. The results of the matches: Xiaosonga Analysis: song$ matches a string ending in "song"
Shared use:
The normal expression is:^[0-9a-zA-Z]{4,}$
Text to be matched: a1b234ABC
Matching results:a1b234ABC
Text to match: + a1b23 = 4ABC The result after matching: + a1b23 = 4ABC Analysis: ^[0-9a-zA-Z]{4,}$ matched with a number or letter, and the number of digits is greater than or equal to the four-digit string.
Note: ^ is negative if it is at the beginning of the set; if it is outside of the set, the starting position of the string will be matched.
However, a multi-line mode can be enabled. In multi-line mode, the regular expression engine will use a comma as the separator of the string, ^ will match the beginning of the text or the beginning of a line, and $ will match the end of the text or the end of a line.
Modify the previous example:
The formula is: ((?m) ^ [0-9a-zA-Z]{4,}$
Text to be matched: a1b234ABC + a1b23 is 4ABC ABC123456 This is the last one.
Matching results:a1b234ABC
+a1b23=4ABC
ABC123456
Analysis: ((?m) ^[0-9a-zA-Z]{4,}$ will match the number or letter composition of each line, and the number of digits is greater than the four-digit string.
Note: If a multi-line pattern is used, the m must be placed at the beginning of the regular expression. (?m) is not supported in most regular expression implementations. Some regular expression implementations also support the use of \A to match the beginning of the string and \Z to match the end of the string. If supported, these metadata have the same function as ^, $.
Translated from the booklet by iOS_Chosunoko