Regex matching and naming groups in C#



Let’s say you have a string and want to match individual groups of characters within that string. How would you do that? That’s the question I asked myself.
The following code is the outcome of some research on how to get this done. It shows you how to capture/match and name groups of characters using a regex pattern.

class RegexGroupsNames
{
    public static void Main()
    {
        // String to be parsed
        string str = "777L777_333_4444_55555_22_20090926_1727_666666_999999999_1010101010";

        // Regex pattern
        // Here we define the groups that form our string according to our need.
        // Each group has a name so that it's easier to get the individual values.
        string pattern =
               @"(?<group1>\d{3}[A-Z]\d{3})_(?<group2>\d{3})_(?<group3>\d{4})_(?<group4>\d{5})_(?<group5>\d{2})_(?<group6>\d{8})_(?<group7>\d{4})_(?<group8>\d{6})_(?<group9>\d{9})_(?<group10>\d{10})";

        // Creating the Regex used to parse the string with the pattern defined above
        Regex regex = new Regex(pattern);

        // String is a match or not ?
        Console.WriteLine("{0} {1} a valid string.", str, Regex.IsMatch(str, pattern, RegexOptions.IgnoreCase) ? "is" : "is not");

        // Matching the string and getting the groups named above
        Match match = regex.Match(str);

        // Writing the values of each group
        Console.WriteLine("group1  = {0}", match.Groups["group1"].Value);
        Console.WriteLine("group2  = {0}", match.Groups["group2"].Value);
        Console.WriteLine("group3  = {0}", match.Groups["group3"].Value);
        Console.WriteLine("group4  = {0}", match.Groups["group4"].Value);
        Console.WriteLine("group5  = {0}", match.Groups["group5"].Value);
        Console.WriteLine("group6  = {0}", match.Groups["group6"].Value);
        Console.WriteLine("group7  = {0}", match.Groups["group7"].Value);
        Console.WriteLine("group8  = {0}", match.Groups["group8"].Value);
        Console.WriteLine("group9  = {0}", match.Groups["group9"].Value);
        Console.WriteLine("group10 = {0}", match.Groups["group10"].Value);
// Defining the Culture to show the DateTime IFormatProvider culture = new CultureInfo("en-US", true); // Creating a DateTime variable with the data contained within groups 6 and 7 DateTime dt = DateTime.ParseExact(match.Groups["group6"].Value + match.Groups["group7"].Value, "yyyyMMddHHmm", culture); Console.WriteLine(dt); } }

This is the string to be parsed:

"777L777_333_4444_55555_22_20090926_1727_666666_999999999_1010101010"

It has 10 parts “groups” separated by an underscore character ( _ ).

What we want to do is to extract each individual group so that we can manipulate it anyway we want.

To accomplish that we define a regex that has the following pattern:

@"(?<group1>\d{3}[A-Z]\d{3})_(?<group2>\d{3})_(?<group3>\d{4})_(?<group4>\d{5})_(?<group5>\d{2})_(?<group6>\d{8})_(?<group7>\d{4})_(?<group8>\d{6})_(?<group9>\d{9})_(?<group10>\d{10})";
Let’s dissect the regex…
The 1st group denoted by the first pair of round brackets ( ) will look for 3 digits \d{3} followed by an uppercase letter ranging from A through Z [A-Z] followed by another 3 digits \d{3}.
The 2nd group denoted by the second pair of parenthesis will look for 3 digits, the 3rd group will look for 4 digits, the 4th group will look for 5 digits and so on… I think you got the point! :- )
While playing with this I found an interesting thing, that is, you can give a name to each group. In this case, the name goes inside the angle brackets < > preceded by a question mark. I’ve given the name <group1> to the first group and just incremented the final number in the others.
Naming your groups is great because you can refer to them while manipulating the matched groups without having to remember the exact position inside the string. Instead of doing this:
// Writing the values of each group
Console.WriteLine(match.Groups[0].Value);
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
Console.WriteLine(match.Groups[3].Value);
Console.WriteLine(match.Groups[4].Value);
Console.WriteLine(match.Groups[5].Value);
Console.WriteLine(match.Groups[6].Value);
Console.WriteLine(match.Groups[7].Value);
Console.WriteLine(match.Groups[8].Value);
Console.WriteLine(match.Groups[9].Value);

We can do this:

// Writing the values of each group
Console.WriteLine(match.Groups["group1"].Value);
Console.WriteLine(match.Groups["group2"].Value);
Console.WriteLine(match.Groups["group3"].Value);
Console.WriteLine(match.Groups["group4"].Value);
Console.WriteLine(match.Groups["group5"].Value);
Console.WriteLine(match.Groups["group6"].Value);
Console.WriteLine(match.Groups["group7"].Value);
Console.WriteLine(match.Groups["group8"].Value);
Console.WriteLine(match.Groups["group9"].Value);
Console.WriteLine(match.Groups["group10"].Value);

Isn’t it cool!?

With this you can create your regex pattern and match the groups of characters that interest you the most.

Grouping enables you to work with separate sets of data. Naming each group enables you to refer to each one of them easily.

This is the output of the code:

Regular Expression Grouping and Naming

Hope this helps.

References
RegExLib.com - Regular Expression Library
http://regexlib.com/

Silverlight Regular Expression Tester
http://regexlib.com/RESilverlight.aspx