小弟正在研究正規表示法中
目前遇到一個問題
即是 Match Group Capture 如何區分
目前已知以以下程式為例
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{ string input = "Yes. This dog is very friendly.";
string pattern = @"((\w+)[\s.])+";
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Match: {0}", match.Value);
for (int groupCtr = 0; groupCtr < match.Groups.Count; groupCtr++)
{
Group group = match.Groups[groupCtr];
Console.WriteLine(" Group {0}: {1}", groupCtr, group.Value);
for (int captureCtr = 0; captureCtr < group.Captures.Count; captureCtr++)
Console.WriteLine(" Capture {0}: {1}", captureCtr, group.Captures[captureCtr].Value);
}
}
}
}
其結果會出現
Match: Yes.
Group 0: Yes.
Capture 0: Yes.
Group 1: Yes.
Capture 0: Yes.
Group 2: Yes
Capture 0: Yes
Match: This dog is very friendly.
Group 0: This dog is very friendly.
Capture 0: This dog is very friendly.
Group 1: friendly.
Capture 0: This
Capture 1: dog
Capture 2: is
Capture 3: very
Capture 4: friendly.
Group 2: friendly
Capture 0: This
Capture 1: dog
Capture 2: is
Capture 3: very
Capture 4: friendly
其中我在網路上爬文已知 Match 基本等於Group0
Group可以用 () 來區分要在哪個Group
不過對於Capture還不知道要怎麼區分
其中在Match: This dog is very friendly. 的Group1之下同時有5個Capture
想詢問Capture的機制
# ((\w+)[\s.])+
Group 1: friendly.
# 正則只會匹配到最後一個 (\w+)[\s.]
Capture 0: This
# 這個也符合 (\w+)[\s.]
Capture 1: dog
# 這個也符合 (\w+)[\s.]
Capture 2: is
# 這個也符合 (\w+)[\s.]
Capture 3: very
# 這個也符合 (\w+)[\s.]
Capture 4: friendly.
# 這個也符合 (\w+)[\s.]
# 所以 Group1 的符合項目 會是 [a-zA-Z0-9]+ 加 空格 或 .
# Capture 會捕捉所有符合項目
# 運作邏輯
# This dog is very friendly.
# Group 會一直找到最後一項匹配項目
# "This " 匹配 (\w+)[\s.] Group 繼續往下找
# Capture 紀錄匹配項目 Capture ++
# "dog " 匹配 (\w+)[\s.] Group 繼續往下找
# Capture 紀錄匹配項目 Capture ++
# "is " 匹配 (\w+)[\s.] Group 繼續往下找
# Capture 紀錄匹配項目 Capture ++
# "very " 匹配 (\w+)[\s.] Group 繼續往下找
# Capture 紀錄匹配項目 Capture ++
# "friendly." 匹配 (\w+)[\s.] Group 繼續往下找 但這是最後的字母了
# 所以 Group 1 會出現 friendly. Capture 紀錄匹配項目 Capture ++
# Group2 同理
Group 2: friendly
# (\w+)
Capture 0: This
# (\w+)
Capture 1: dog
# (\w+)
Capture 2: is
# (\w+)
Capture 3: very
# (\w+)
Capture 4: friendly
# (\w+)
# Capture 一樣會符合所有匹配項目 但這裡的 Capture 只會有單字