Regular Expression 30-Minute Beginner's Tutorial

The goal of this article

30 minutes to understand what regular expressions are and gain basic knowledge, enabling you to use them in your own programs or web pages.

Regular Expression 30-Minute Beginner's Tutorial The goal of this article How to use this tutorial What is a regular expression? beginner Test regular expression Metacharacter Character escaping Repeat Character class branching condition Grouping antonym Backward reference zero-width assertion negative zero-width assertion Annotation Greed and laziness Handle options Balanced Group/Recursive Matching What else has not been mentioned? Contact the author. Online resources and references in this article

How to use this tutorial

Don't be overwhelmed by the complex expressions below. Just follow me step by step, and you'll find regular expressions not as difficult as you think. Of course, it's normal if, after finishing this tutorial, you find you understand much but barely remember anything—I believe the chances of someone new to regex retaining over 80% of the mentioned syntax after this tutorial are zero. This is just to give you the basic principles; you'll need more practice and usage in the future to master regex

Most importantly——Please give me 30 minutes If you have no experience with regular expressions, please do not attempt to do so within 30 second Inner door — unless you're Superman :)

In addition to being a beginner tutorial, this article aims to serve as a reference grammar manual for regular expressions that can be used in daily work. From the author's own experience, this goal has been achieved quite well—I myself haven't been able to memorize everything, have I?

Clear formatting Text format agreement: Technical terminology Meta-character/syntax format Regular expression A part of a regular expression (used for analysis) The source string for matching Description of regular expression or a part thereof 。

hide footnotes The right side of this article includes some footnotes, mainly intended to provide additional information or explain basic concepts to readers without a programming background, which can usually be overlooked.

What is a regular expression?

When writing programs or web pages that handle strings, there is often a need to find strings that match certain complex rules. Regular expression It's a tool for describing these rules. In other words, regular expressions are codes that record text rules.

It's likely that you've used the file search tool under Windows/Dos. Wildcard wildcard ) i.e. * With ? If you want to find all Word documents in a specific directory, you would search: * .doc Here, * Can be interpreted as any string. Like wildcards, regular expressions are tools used for text matching, but unlike wildcards, they can describe your requirements more precisely – at the cost of increased complexity. For example, you can write a regular expression to search for Strings starting with a "0", followed by 2-3 digits, a hyphen "-", and then 7 or 8 digits. As if 010-12345678 or 0376-7654321 )。

character It's the basic unit of text processing in computer software, which may include letters, numbers, punctuation, spaces, line breaks, Chinese characters, etc. String Is a sequence of 0 or more characters. Text That is, text, a string. Speak a certain string. match A regular expression typically refers to a part (or several parts) within a string that meets the conditions specified by the expression.

beginner

The best way to learn regular expressions is to start with examples, understand them, and then modify and experiment with them. Below are many simple examples, with detailed explanations.

Suppose you are searching in an English novel hi You can use regular expressions. hi 。

This is almost the simplest regular expression, which can precisely match such strings: Two-character combination, the first is 'h', the second is 'i' Typically, tools for handling regular expressions provide an option to ignore case, which, if selected, can match... hi , HI , Hi , hI Any of the four situations.

Unfortunately, many words contain hi These consecutive characters, such as him , history , high Wait a minute. hi If you look for it, it's here. hi Will also be found out. If needed. Exact search for the word "hi" If that's the case, we should use \bhi\b 。

\b It is a special code specified by a regular expression (well, some people call it...) Metacharacter. metacharacter )，symbolizing The beginning or end of a word, i.e., the dividing point of a word. . Although usually English words are separated by spaces, punctuation marks, or line breaks, but \b Does not match any of the words in the delimiter string, it Match only one position. 。

If what you are looking for is... Hi, not far away follows Lucy. You should use \bhi\b.*\bLucy\b 。

If a more precise expression is needed, \b Match positions where the preceding and following characters are not both (one is, the other isn't or doesn't exist). \w 。

Here, "." is another metacharacter that matches Any character except a newline . * is also a metacharacter, but it does not represent a character or a position; instead, it specifies the quantity—indicating that... * The content before can be repeated any number of times to match the entire expression. Therefore,. * Concatenated means any number of non-newline characters Now. \bhi\b. * \bLucy\b The meaning is quite clear: Hi, followed by any number of any characters (except for line breaks), ending with the word Lucy. 。

If we use other metacharacters simultaneously, we can construct more powerful regular expressions. For example, the following example:

0\d\d-\d\d\d\d\d\d\d\d Match such a string: Starting with 0, followed by two digits, then a hyphen "-", and finally eight digits. (That is, China's phone numbers. Of course, this example only matches the cases with 3-digit area codes).

换行符就是'\n',ASCII编码为10(十六进制 0x0A The character(s).

Here \d It's a new metacharacter that matches A digit (0, or 1, or 2, or ...) ).- is not a metacharacter; it matches itself—the hyphen (or minus sign, or dash, or whatever you call it).

To avoid so many annoying repetitions, we can also express it like this: 0\d{2}-\d{8} Here. \d the following { 2 }({ 8 The meaning of "})" is the preceding. \d Must match consecutively twice (8 times) 。

Test regular expression

If you don't find regular expressions hard to read/write, either you're a genius or you're not from Earth. The syntax is quite perplexing, even for regular users. Due to their readability and error-prone nature, using a testing tool for regex is highly recommended.

Other available testing tools:

RegexBuddy

JavaScript Regular Expression Online Testing Tool

Regular expressions have some details that vary in different environments. This tutorial covers the behavior of regular expressions in Microsoft .Net Framework 4.0, so, I recommend the tools I've written for .Net. Regular Expression Tester Please refer to the instructions on this page for installing and running the software.

Below is a screenshot of the Regex Tester in operation:

regex_tester

Metacharacter

Now you already know several very useful metacharacters, such as: \b , . , * , also \d Regular expressions have more metacharacters, such as... \s match Any whitespace characters, including spaces, tab characters (Tab), line breaks, and full-width Chinese spaces. 。 \w match Alphabet or digits or underscores or Chinese characters, etc. 。

Let's take a look at more examples below:

\ba\w * \b Match with letter a The word at the beginning——firstly, at the beginning of a word \b ) then the letters a , followed by any number of letters or digits \w * )，the word ends here( \b )。

The special handling of Chinese/characters is supported by the .Net regular expression engine; for specific details in other environments, please refer to the relevant documentation.

Alright, let's talk about what a word means in regular expressions: it's at least one consecutive \w. Not bad, it's not really related to the thousands of things we have to memorize when learning English :)

\d+ match One or more consecutive numbers Here. + Is and * Similar metacharacters, but different. * Match any repetition (including 0 times), and + then match Repeat once or more. 。

\b\w{6}\b match Exactly six-letter word 。

Code	Instruction
.	Match any character except the newline character
\w	Match letters, digits, underscores, or Chinese characters.
\s	Match any whitespace character.
\d	Match numbers
\b	Match the start or end of a word
^	Match string beginning
$	Match string end

Metacharacter ^ Symbol on the same key as the number 6 $ One match per position, this is \b Somewhat similar. ^ Match the beginning of the string you want to search for. $ These codes are very useful for verifying input content, such as when a website requires that the QQ number entered must be between 5 and 12 digits. ^\d{5,12}$ 。

Here is the { 5,12 } and the one introduced earlier{ 2 It's similar, just not as much. 2 } match Can only repeat exactly twice ，{ 5,12 Then it is The repetition cannot be less than 5 times and more than 12 times. Otherwise, they don't match.

Regular expression engines typically provide a method to "test if a specified string matches a regular expression," such as in JavaScript. RegExp.test() Method or .NET's Regex.IsMatch() method. Here, matching refers to whether there is a part in the string that conforms to the expression rules. If not using ^ With $ If that be the case, for... \d{5,12} In this regard, using such a method can only guarantee that the string contains 5 to 12 consecutive numbers It's a string of 5 to 12 digits, rather than the entire string.

Because of the use of ^ With $ So the entire input string is to be used for comparison with \d{5,12} Match, i.e., the entire input must be 5 to 12 numbers Thus, if the input QQ number matches this regular expression, it meets the requirements.

Similar to the case-insensitive option, some regular expression processors also have a multi-line option. If this option is checked, ^ With $ The meaning has become... Start and end of the matched line 。

Character escaping

If you want to search for the meta-character itself, such as searching for . or *, problems arise: you can't specify them because they will be interpreted as something else. In such cases, you need to use a backslash \ to cancel the special meaning of these characters. Therefore, you should use . With \ * Of course, to find the backslash itself, you also need to use \. \ \ .

For example: unibetter \ .com match unibetter.com ， >C: \ Windows match C:\Windows 。

Repeat

You have already seen the previous ones. * , + , {2} , {2,5} These are several ways of matching repetitions. Below are all the quantifiers in regular expressions (codes specifying quantities, such as *, {5,12} etc.).

Code/Grammar	Instruction
*	Repeat 0 or more times
+	Repeat once or more times.
?	Repeat zero or one time
{n}	Repeat n times
{n,}	Repeat n times or more
{m,m}	Repeat n to m times

Here are some examples of repetition:

Windows\d+ match Windows followed by one or more numbers

^\w+ match The first word of a line (or the first word of the entire string, depending on the option settings).

Character class

It's simple to search for numbers, letters, or alphanumeric characters, as there are already corresponding codes for these characters. * * The metacharacters, but if you want to match characters that do not have predefined metacharacters * * (For example, with vowel letters like a, e, i, o, u), what should one do?

It's simple, just list them inside the brackets like that. [aeiou] just match Any English vowel letter ， [.?!] match Punctuation marks: . or ? or ! 。

We can also easily specify a character Scope as if [0-9] The meaning of the representative. \d It's completely identical: A digit 。 Likewise. [a-z0-9A-Z_] equivalent to \w (If English only.)

Below is a more complex expression: \ (?0\d{2}[) -]?\d{8} 。

This expression can match Several formats of phone numbers as if (010)88886666 or 022-22334455 or 02912345678 Let's analyze it: First, an escape character. \ It can appear 0 or 1 times. ? ), then is a 0 , followed by two numbers \d{2} ) then ( or - or Space One of them, it appears 1 time or does not appear at all. ? ), lastly, eight digits ( \d{8} )。

"( " and " )" are also metacharacters, followed by Group section It will be mentioned, so it is necessary to use it here. Escape 。

branching condition

Unfortunately, that expression can also match. 010)12345678 or (022-87654321 Such an "incorrect" format. To solve this problem, we need to use: branching condition Regular expression in branching condition It refers to several rules; if any of them are met, it should be considered a match. The specific method is to separate different rules with a '|'. Don't understand? No problem, see the example:

0\d{2}-\d{8}|0\d{3}-\d{7} This expression can... Match two types of phone numbers separated by hyphens: one with a three-digit area code and eight-digit local number (e.g., 010-12345678), and the other with a four-digit area code and seven-digit local number (e.g., 0376-2233445). 。

\ (0\d{2} ) [- ]?\d{8}|0\d{2}[- ]?\d{8} This expression Match phone numbers with 3-digit area codes, which can be enclosed in parentheses or not. The area code and local number can be separated by a hyphen, space, or no separator. You can try expanding this expression to support a 4-digit area code using branch conditions.

\ d{5}-\d{4}|\d{5} This expression is used to match U.S. ZIP codes. The rule for U.S. ZIP codes is 5 digits or 9 digits separated by a hyphen. The reason for giving this example is that it illustrates a point: When using branch conditions, pay attention to the order of the conditions. If you change it to... \ d{5}|\d{5}-\d{4} If so, it will only match 5-digit ZIP codes (and the first 5 digits of 9-digit ZIP codes). This is because the conditions in the branching will be tested from left to right, and once a branch is satisfied, the other conditions will not be checked.

Grouping

We've mentioned how to repeat a single character (just add a specifier after the character); but what if you want to repeat multiple characters? You can use parentheses to specify. sub-expression (also known as) Grouping ), then you can specify the repetition count of this sub-expression, and you can also perform other operations on the sub-expression (details to follow).

(\d{1,3} . ){3}\d{1,3} It's a simple one. IP Address matching Expression. To understand this expression, please analyze it in the following order: \d{1,3} Match numbers from 1 to 3 digits. ， (\d{1,3} \ .){3} match Three-digit number followed by an English period (the whole thing is this) Grouping Repeat 3 times. Finally, add in addition to that. A one to three-digit number \ d{1,3} ) 。

Unfortunately, it will also match 256.300.888.999 This non-existent IP address. If arithmetic comparison were available, perhaps the problem could be solved simply. However, regular expressions do not provide any mathematical functions, so only lengthy groupings, selections, and character classes can be used to describe a correct IP Address: ((2[0-4]\d|25[0-5]|[01]?\d\d?) \ .){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) 。

Each number in an IP address cannot exceed 255, please don't be misled by the writers of Season 3 of "24"...

The key to understanding this expression is to comprehend... 2[0-4]\d|25[0-5]|[01]?\d\d? Here, I won't go into detail; you should be able to analyze its meaning yourself.

antonym

Sometimes it is necessary to search for characters that do not belong to a simply defined character class. For instance, when you want to search for any character except digits, in which case you need to use... antonym ：

Code/Grammar	Instruction
\W	Match any character that is not a letter, digit, underscore, or Chinese character.
\S	Match any character that is not a whitespace
\D	Match any non-numeric character
\B	Matching is not at the beginning or end of a word
[^x]	Match any character except 'x'
[^aeiou]	Match any character except 'aeiou' letters.

Example: \S+ Match strings without whitespace.

<a[^> ] +> Match strings that start with 'a' enclosed in angle brackets.

Backward reference

After specifying a sub-expression within parentheses, Match this sub-expression text (i.e., the content captured by this group) can be further processed in expressions or other programs. By default, each group will automatically have a Group number The rule is: from left to right, the left bracket of each group marks the start, the first group is numbered 1, the second is 2, and so on.

Backward reference Used for repeating the text matched in a previous group. For example, \1 representative Group 1 matched text Hard to understand? Please see the example:.

\b(\w+)\b\s+\1\b Can be used to match duplicate words, like go go , or kitty kitty This expression is first and foremost... one word i.e. More than one letter or digit at the beginning and end of a word (\b(\w+)\b) This word will be captured into Group 1, followed by... one or several blank spaces ( \s+ ), finally is The content captured in Group 1 (i.e., the word matched previously) \1 ) 。

Um... actually, the allocation of group numbers is not as simple as I just said.

Group 0 corresponds to the entire regular expression.

Actually, the group number allocation process involves scanning from left to right twice: the first pass assigns numbers only to unnamed groups, and the second pass assigns numbers only to named groups – thus, all named group numbers are greater than those of unnamed groups.

You can use the syntax (?:exp) to exclude a group from participating in the allocation of group numbers.

You can also specify subexpressions yourself. Group Name To specify a subgroup name for an subexpression, use the following syntax: ? < Word>\w+ )(或者把尖括号换成'也行：( ?'Word'\w+ ), thus achieving that \w+ The group name is specified as Word Reverse reference this group. capture The content, you can use \k < Word> So, the previous example can also be written as: \b(? < Word>\w+)\b\s+\k < Word>\b 。

When using parentheses, there are many grammatical uses specific to each. Below is a list of the most commonly used ones:


xxxxxxxxxx
分类               代码/语法              Instruction
-----------------------------------------------------------------------------------------
                  (exp)             匹配exp,并捕获文本到自动命名的组里
捕获               (?<name>exp)      匹配exp,并捕获文本到名称为name的组里，也可以写成(?'name'exp)
                  (?:exp)           匹配exp,不捕获匹配的文本，也不给此分组分配组号
-----------------------------------------------------------------------------------------
                  (?=exp)           匹配exp前面的位置
zero-width assertion            (?<=exp)          匹配exp后面的位置
                  (?!exp)           匹配后面跟的不是exp的位置
                  (?<!exp)          匹配前面不是exp的位置
----------------------------------------------------------------------------------------- 
Annotation               (?#comment)       这种类型的分组不对正则表达式的处理产生任何影响，用于提供注释

We have discussed the first two grammatical points. The third... ?:exp It won't alter the regex processing method, but the matched content in this group won't be captured into a group like the first two, nor will it have a group number. "Why would I want to do this?"—That's a good question, why do you think?

zero-width assertion

The next four are used to find things before or after certain content (but not the content itself), meaning they are like \b , ^ , $ Such are used to specify a location, which should meet certain conditions (i.e., assertions), and are therefore also called zero-width assertion It's best to illustrate with examples:

(?=exp) Also known as zero-width positive lookahead assertion It The assertion can match the expression exp after its own position. For instance. \b\w+(?=ing\b) , match with ing The preceding part of the word at the end (except) ing The rest (excluding the aforementioned part) As for searching I'm singing while you're dancing. When, it will match sing With danc 。

(?<=exp) Also known as Zero-width positive lookahead assertion It Assert that the position before itself can match the expression exp For example ( ?<=\bre)\w+\b Will match with suffixes following words beginning with "re-" e.g., when searching reading a book When, it matches ading 。

Earthlings, do you find these terms too complex and hard to remember? I feel the same way. Just know they exist; what they're called, let it be! If a person has no name, they can focus on sword practice; if a thing has no name, it can be freely chosen or discarded...

Assertions are used to declare a fact that should be true. In regular expressions, the match will only proceed if the assertion is true.

If you want to add commas every three digits in a long number (starting from the right), you can locate where to insert commas by: ((?<=\d)\d{3})+\b Use it for 1234567890 During the search, the result is: 234567890 。

The following example employs both of these assertions simultaneously: (?<=\s)\d+(?=\s) match Numbers separated by whitespace (again, excluding these whitespace characters) 。

negative zero-width assertion

We mentioned earlier how to find. Not a specific character or not in a specific character class. The method of the character (antonym). But if we merely want to... Ensure a character is not present but don't want to match it. What if? For example, if we want to find a word that contains the letter 'q' but is not followed by 'u', we can try:

\b\w * q [ ^u ] \w * \b match Include Letters following "q" that are not "u" of word(s) . But if you do more tests (or if you're sufficiently observant), you'll notice that when 'q' appears at the end of a word, like Iraq , Benq This expression will result in an error because... [^u] Always match a character, so if 'q' is the last character of a word, then the following... [ ^u] The following characters after "q" will be matched (which may be a space, or a period, or other delimiters). \w * \b Will match the next word, then. \b\w * q [ ^u]\w * \b Can match the whole Iraq fighting 。 negative zero-width assertion It can solve such a problem because it only matches one position, not... consumer spending Any character. Now, we can solve this problem like this: \b\w * q(?!u)\w * \b 。

zero-width negative lookahead assertion (?!exp) ， Assert that the position cannot match expression exp. For example: \d{3}(?!\d) Match three-digit numbers, and this The last digit of a three-digit number cannot be a number. ； \b((?!abc)\w)+\b match Do not include consecutive strings. abc of word(s) 。

Similarly, we can use (?<!exp) , zero-width negative lookbehind assertion Come. Assert that the position cannot match expression exp before this position. ： (?<![a-z])\d{7} match Digits before the seven-character number are not in lowercase letters. 。

A more complex example: (?<=<(\w+)>). * (?=<\/\1>) Match the content within simple HTML tags without attributes 。 (?<=<(\w+)>) Specified such prefix ： Words enclosed in angle brackets e.g. possibly < b>)，然后是 .* (an arbitrary string), followed by the last one. suffix (?=<\/\1>) Note the suffix in the file extension. \/ It uses the previously mentioned character escaping. \1 It is a reverse citation, citing exactly that. The first batch captured , the previous (\w+) Matching content, so if the prefix actually is < b>的话，后缀就是 < /b>了。整个表达式匹配的是 < b>和 < /b>之间的内容(再次提醒，不包括前缀和后缀本身)。

Please analyze the expression in detail. (?<=<(\w+)>). * (?=<\/\1>) This expression best demonstrates the true purpose of zero-width assertions.

Annotation

Another use of parentheses is through grammar. (?#comment) Here is an example with annotations: "I love this book! It's my favorite. (I am expressing a strong fondness for the book and it is my top preference.)" 2[0-4]\d(?#200-249)|25 [ 0-5](?#250-255)|[01]?\d\d?(?#0-199) 。

To include comments, it's best to enable the "Ignore whitespace in patterns" option. This allows for arbitrary addition of spaces, tabs, and line breaks when writing expressions, and these will be ignored during actual usage. With this option enabled, all text after the # symbol until the end of the line is considered a comment and will be ignored. For example, we can write the previous expression like this:


(?<=	Prefix of the text to be matched by the assertion
<(\w+)>	Find letters or numbers enclosed in angle brackets (i.e., HTML/XML tags).
)	Prefix ends
.*	Match any text
(?=	Assertion to match the suffix of the text
<\/\1>	# 查找尖括号括起来的内容：前面是一个"/"，后面是先前捕获的标签
)	Suffix ends.

Greed and laziness

When regular expressions include quantifiers that allow for repetition, the typical behavior is (under the premise that the entire expression can be matched) to match. As much as possible. The character. Taking this expression as an example: a. * b It will match The longest string starting with 'a' and ending with 'b'. If used to search aabab If so, it will match the entire string. aabab This is called GREED Match.

Sometimes, we need more... lazy Matching, i.e., matching As little as possible. The character. All the qualifiers provided can be converted into lazy matching patterns by adding a question mark after them. ? This way. .*? it means Match any number of repetitions, but use the fewest repetitions possible to make the entire match successful. Now, let's look at an example of the lazy version:

a. * ?b match The shortest string starting with 'a' and ending with 'b'. If applied to... aabab If so, it will match aab The first to the third characters: (first to third) With ab (the fourth to fifth characters) 。

Code/Grammar	Instruction
*?	Repeat any number of times, but as few as possible.
+?	Repeat once or more, but as few times as possible.
??	Repeat 0 or 1 time, but as few as possible
{n,m}?	Repeat from n to m, but as few times as possible.
{n,}?	Repeat n times or more, but as few repetitions as possible.

Why is the first match 'aab' (first to third characters) instead of 'ab' (second to third characters)? Simply put, because regular expressions have another rule with a higher priority than the lazy/greedy rule: the match that begins earliest has the highest priority—the match that begins earliest wins.

Handle options

Above, several options such as ignoring case and handling multiline were introduced, which can be used to alter the way regular expressions are processed. Below are common regular expression options in .Net:

Name	Instruction
`IgnoreCase` Ignore case.	Case-insensitive matching
`Multiline` Multi line mode	Alter ^ With $ The meaning is to make them match at the beginning and end of each line, rather than just at the beginning and end of the entire string. (In this mode,) $ The precise meaning is: match positions before and up to the end of the string.)
`Singleline` (Single-line mode)	Change the meaning of '.' to match every character, including newline characters (\n).
`IgnorePatternWhitespace` (ignoring blank spaces)	Ignore non-escaped whitespace within expressions and enable by # Marked comment
`ExplicitCapture` (explicitly caught)	Only capture explicitly named groups

A common question is: Can only one mode—either multi-line or single-line—be used at a time? The answer is: No. There is no relationship between the two options, except their names sound similar (which can be confusing).

In C#, you can use Regex(String, RegexOptions) constructor 来设置正则表达式的处理选项。如：Regex regex = new Regex(@"\ba\w{6}\b", RegexOptions.IgnoreCase);

Balanced Group/Recursive Matching

Sometimes we need to match like ( 100 * ( 50 + 15 ) ) Such a nested hierarchical structure, simply using \ (.+ \ ) It will only match the content between the leftmost left parenthesis and the rightmost right parenthesis (here we are discussing the greedy mode; the lazy mode also has the same problem). If the number of left and right parentheses in the original string is not equal, for example: ( 5 / ( 3 + 2 ) ) ) That being the case, the number of both in our matching results won't be equal. Is there a way to match the longest content between the paired brackets in such strings?

The balanced group syntax introduced here is supported by .Net Framework; other languages/libraries may not support this feature, or may support it with a different syntax.

To avoid ( With \ ( Confuse your brain thoroughly; let's use angle brackets instead of parentheses. Now our problem has become how to turn xx aa> yy Extract the content within the longest matching angle brackets from such a string?

的是这个语法结构是“X的Y是Z”，其中X是主语，Y是定语，Z是表语。以下是这个结构的一些例子： - 我的名字是张三。 - 这本书的内容是关于历史的。 - 这个问题的答案是正确的。根据您的要求，以下是对“X的Y是Z”这个语法结构的翻译： - X's Y is Z - The Y of X is Z - The Y in X is Z

(?'group') Name the captured content as "group" and push it in. Stack
(?'-group') Pop the last captured content named "group" from the stack. If the stack is initially empty, the match for this group fails.
(?(group)yes|no) If there is a captured content named "group" on the stack, continue matching the expression in the "yes" part; otherwise, continue matching the "no" part.
(?!) Zero-width negative lookahead, due to the absence of a suffix expression, the attempt to match always fails.

我们需要做的是每碰到了左括号，就在压入一个"Open",每碰到一个右括号，就弹出一个，到了最后就看看堆栈是否为空－－如果不为空那就证明左括号比右括号多，那匹配就应该失败。正则表达式引擎会进行回溯(放弃最前面或最后面的一些字符)，尽量使整个表达式得到匹配。

如果你不是一个程序员（或者你自称程序员但是不知道堆栈是什么东西），你就这样理解上面的三种语法吧：第一个就是在黑板上写一个"group"，第二个就是从黑板上擦掉一个"group"，第三个就是看黑板上写的还有没有"group"，如果有就继续匹配yes部分，否则就匹配no部分。


xxxxxxxxxx
<                         outermost left parenthesis
[^<>]*                    The content following the outermost left parenthesis is not enclosed in parentheses.
 (
 (
 (?'Open'<)               # Encountered a left parenthesis, write "Open" on the blackboard
 [^<>]*                   Match the content after the left parenthesis that is not a parenthesis.
  )+
      (
      (?'-Open'>)         Encountered a right parenthesis, erase one "Open".
       [^<>]*             #Match the content after the right parenthesis that is not a parenthesis
  )+
      )*
          (?(Open)(?!))   Before the outermost right parenthesis, check if there are any remaining "Open" words on the blackboard; if there are, the match fails.
           >              outermost right parenthesis

One of the most common applications of the balanced group is to match HTML; the following example can match nested structures. < div>标签： <div [ ^>] * > [ ^<>]*(((?'Open'] * >) [ ^<>] * )+((?'-Open') [ ^<>] * )+) * (?(Open)(?!)) < /div>

What else has not been mentioned?

The elements for constructing regular expressions have been described above, but there are still many things not mentioned. Below is a list of some unmentioned elements, including their syntax and brief explanations. You can find more detailed references online to learn about them--when you need to use them. If you have installed MSDN Library, you can also find detailed documentation on .NET regular expressions within it.

Code/Grammar	Instruction
\a	Alarm character (its print effect is the computer beeps)
\b	It is usually at the word boundary, but if used within a character class, it represents a backspace.
\t	Tab
\r	Return car.
\v	Vertical tab
\f	Page break
\n	Line break
\e	Escape
\0nn	Character with octal code nn in ASCII
\xnn	Character with hexadecimal code nn in ASCII
\unnnn	Character with hexadecimal code nnnn in Unicode
\cN	ASCII control characters. For instance, \cC represents Ctrl+C.
\A	String start (similar to ^, but not affected by the "match multi-line" option)
\Z	End of string or line (not affected by the multi-line option)
\z	String end (like $, but unaffected by the multi-line option)
\G	Current search beginning
\p{name}	Character class named "name" in Unicode, e.g., \p{IsGreek}
(?>exp)	Greedy subexpression
(? < x> - < y>exp)	Balanced group
(?im-nsx:exp)	Modify processing options in sub-expression exp.
(?im-nsx)	Change processing options after the expression.
(?(exp)yes\|no)	Treat exp as a zero-width positive lookahead; use "yes" as the expression for this group if a match is found at this position; otherwise, use "no."
(?(exp)yes)	As above, but using an empty expression as no.
(?(name)yes\|no)	If the group named "name" captures content, use "yes" as the expression; otherwise, use "no".
(?(name)yes)	As above, but using an empty expression as no.

Contact the author.

好吧,我承认,我骗了你,读到这里你肯定花了不止30分钟.相信我,这是我的错,而不是因为你太笨.我之所以说"30分钟",是为了让你有信心,有耐心继续下去.既然你看到了这里,那证明我的阴谋成功了.被忽悠的感觉很爽吧？

To complain about me, or if you think I could have deceived you more cleverly, or have any other issues, feel free to... My blog Let me know.

Online resources and references in this article

You recently used:

Regular Expression 30-Minute Beginner's Tutorial

Regular Expression 30-Minute Beginner's Tutorial

The goal of this article

How to use this tutorial

What is a regular expression?

beginner

Test regular expression

Metacharacter

Character escaping

Repeat

Character class

branching condition

Grouping

antonym

Backward reference

zero-width assertion

negative zero-width assertion

Annotation

Greed and laziness

Handle options

Balanced Group/Recursive Matching

What else has not been mentioned?

Contact the author.

Online resources and references in this article

BeJSON website navigation

JSON tool

Encode/Encry

Format

Internet

front-end

backend

Convert

Others

Document

ImageEdit

TextEdit

Mathematics

Webmaster Tools

Color

Platform tools

More

Mortgage Calculator

Car Loan Calculator

Interest rate calculator

Reverse Calculator

learning tools

BeJSON Communication QQ Group