Calculating prime numbers with LINQ in C#

If you want to see how to use PLINQ (Parallel LINQ) to get performance gains when performing this calculation, take a look at the post titled
Parallel LINQ (PLINQ) with Visual Studio 2010.

To serve as a processing test in a follow up post I’m posting how to calculate prime numbers using a LINQ query expression.

In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The first twenty-five prime numbers are:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

The following code[1] shows how to calculate the prime numbers that are less equal ( <= ) a given number “max”:

Func<int, IEnumerable<int>> primeNumbers = max =>
     from i in Enumerable.Range(2, max - 1)
     where Enumerable.Range(2, i - 2).All(j => i % j != 0)
     select i;

IEnumerable<int> result = primeNumbers(10);
foreach(int i in result)
{
  Console.WriteLine(i);
}

In my final computer engineering assignment I presented LINQ - Language Integrated Query and its constructs. With such constructs it’s possible to achieve a high degree of method chaining.

LINQ is declarative, not imperative. It allows us to simply state what we want to do without worrying about how it is done.

In the code above we declare a Func<(Of <(T, TResult>)>) generic delegate which has an int as input parameter and an IEnumerable<int> as the result.
Func delegates are very useful for encapsulating user-defined expressions that are applied to each element in a set of source data.

Using a lambda expression ( => ) we assign a query expression to the delegate.
A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

The query expression states that from each value i in the Enumerable.Range(2, max - 1) where all elements of the range Enumerable.Range(2, i – 2) satisfy the condition All(j => i % j != 0), we select i.

max is the delegate input parameter.

The % symbol is the modulus operator in C#. It computes the remainder after dividing its first operand by its second.

For example: considering max = 10 we’d have the following…

The range in the from clause is: { 2, 3, 4, 5, 6, 7, 8, 9, 10 }

Taking the 1st value out of the from range, we have i = 2.

The range in the where clause is Range (2, 2 - 2) = Range (2 , 0) = {  }.

Since i = 2 is by default included in the result, what must be evaluated lies where the variable max is used. So, taking the 2nd value and assigning it to i we have i = 3.

The range in the where clause is Range (2, 3 - 2) = Range (2 , 1) = { 2 }.

Evaluating j => i % j we have 3 % 2 = 1, and so, 3 is a prime number.

Taking the 3rd value and assigning it to i we have i = 4.

The range in the where clause is Range (2,  4 - 2) = Range (2, 2) = { 2, 3 }.

Evaluating j => i % j we have 4 % 2 = 0 and 4 % 3 = 1, and so 4 is not a prime number because all elements of the where range must yield a result != 0 for the expression i % j != 0.

Now we have i = 5.

The range in the where clause is Range (2, 5 - 2) = Range (2 , 3) = { 2, 3, 4 }.

Evaluating j => i % j we have 5 % 2 = 1, 5 % 3 = 2 and 5 % 4 = 1. From this we have that 5 is a prime number.

Now we have i = 6.

The range in the where clause is Range (2, 6 - 2) = Range (2 , 4) = { 2, 3, 4, 5 }.

Evaluating j => i % j we have 6 % 2 = 0, 6 % 3 = 0, 6 % 4 = 2 and 6 % 5 = 1. From this we have that 6 is not prime number.

Now we have i = 7.

The range in the where clause is Range (2, 7 - 2) = Range (2 , 5) = { 2, 3, 4, 5, 6 }.

Evaluating j => i % j we have 7 % 2 = 1, 7 % 3 = 1, 7 % 4 = 3, 7 % 5 = 2 and 7 % 6 = 1. From this we have that 7 is prime number.

DebuggingPrimeNumbersLINQQueryExpression

Now we have i = 8.

The range in the where clause is Range (2, 8 - 2) = Range (2 , 6) = { 2, 3, 4, 5, 6, 7 }.

Evaluating j => i % j we have 8 % 2 = 0, 8 % 3 = 2, 8 % 4 = 0, 8 % 5 = 3, 8 % 6 = 2 and 8 % 7 = 1. 8 isn’t a prime number.

Now we have i = 9.

The range in the where clause is Range (2, 9 - 2) = Range (2 , 7) = { 2, 3, 4, 5, 6, 7, 8 }.

Evaluating j => i % j we have 9 % 2 = 1, 9 % 3 = 0, 9 % 4 = 1, 9 % 5 = 4, 9 % 6 = 3, 9 % 7 = 2 and 9 % 8 = 1. 9 isn’t a prime number.

Now we have i = 10.

The range in the where clause is Range (2, 10 - 2) = Range (2 , 8) = { 2, 3, 4, 5, 6, 7, 8, 9 }.

Evaluating j => i % j we have 10 % 2 = 0, 10 % 3 = 1, 10 % 4 = 2, 10 % 5 = 0, 10 % 6 = 4, 10 % 7 = 3, 10 % 8 = 2 and 10 % 9 = 1. 10 isn’t a prime number.

Finally we have 4 prime numbers <= 10; they are: { 2, 3, 5, 7 }.

The following table illustrates the result:

i where Range(2, i - 2) All(j => i % j != 0)
2 { }  
3 { 2 } 3 % 2 = 1
4 { 2, 3 } 4 % 2 = 0
4 % 3 = 1
5 { 2, 3, 4 } 5 % 2 = 1
5 % 3 = 2
5 % 4 = 1
6 { 2, 3, 4, 5 } 6 % 2 = 0
6 % 3 = 0
6 % 4 = 2
6 % 5 = 1
7 { 2, 3, 4, 5, 6 } 7 % 2 = 1
7 % 3 = 1
7 % 4 = 3
7 % 5 = 2
7 % 6 = 1
8 { 2, 3, 4, 5, 6, 7 } 8 % 2 = 0
8 % 3 = 2
8 % 4 = 0
8 % 5 = 3
8 % 6 = 2
8 % 7 = 1
9 { 2, 3, 4, 5, 6, 7, 8 } 9 % 2 = 1
9 % 3 = 0
9 % 4 = 1
9 % 5 = 4
9 % 6 = 3
9 % 7 = 2
9 % 8 = 1
10 { 2, 3, 4, 5, 6, 7, 8, 9 } 10 % 2 = 0
10 % 3 = 1
10 % 4 = 2
10 % 5 = 0
10 % 6 = 4
10 % 7 = 3
10 % 8 = 2
10 % 9 = 1

I’ll use the primeNumbers delegate in a future post to evaluate PLINQ (Parallel LINQ) and measure the performance gains when the same calculation is done in parallel instead of in sequence.

References
[1] Perfetti, Michel. LINQ Quiz: The list of prime numbers in 3 clauses? 2007. Available at <http://blogs.developpeur.org/raptorxp/archive/2007/11/26/quizz-linq-la-liste-des-nombres-premiers-en-3-clauses.aspx>. Accessed on May 31, 2009.

Logging NHibernate SQL with log4net in ASP.NET

Have you ever wondered how to log the SQL generated by NHibernate?

This post tries to exemplify just that.

NHibernate uses HQL to leverage its expressiveness to the developer, but behind the scenes there is an engine that transforms the HQL into pure SQL that is executed against the database. This SQL can be logged so that you can see its structure and get a snapshot of what the database engine receives.

log4net is a logging tool that helps the developer see what SQL NHibernate is generating under the covers.

This is a brief description of log4net taken from its homepage:

log4net is a tool to help the programmer output log statements to a variety of output targets.

First and foremost you need to enable NHibernate logging in its configuration file. The property that sets this is hibernate.show_sql.

<add key="hibernate.show_sql" value="true" />

The following piece of code shows how to configure an appender and a logger that makes use of the appender. This code is kept inside the Web.config file in the log4net configuration section:

<appender name="NHibernateRollingFileAppender" type="log4net.Appender.RollingFileAppender">
    <file value="LogNHibernate.txt"/>
    <appendToFile value="true"/>
    <rollingStyle value="Size"/>
    <datePattern value="yyyyMMdd"/>
    <maxSizeRollBackups value="10"/>
    <maximumFileSize value="10MB"/>
    <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%date - %message%newline"/>
    </layout>
</appender>

<logger name="NHibernateLogger" additivity="false">
    <level value="DEBUG"/> <!-- ALL, DEBUG, INFO, WARN, ERROR, FATAL or OFF -->
<appender-ref ref="NHibernateRollingFileAppender"/> </logger>

I’ll describe each part of the above code.

<appender name="NHibernateRollingFileAppender" type="log4net.Appender.RollingFileAppender">

Appender is an output destination. In this case its a RollingFileAppender. It writes logging events to a file in the file system.

<file value="LogNHibernate.txt"/>

The file property specifies the name of the file that’ll store the logs.

<appendToFile value="true"/>

The appendToFile property is set to true so that the appender will overwrite existing files.

<rollingStyle value="Size"/>

The rollingStyle property set how to roll log files. In this case the appender will roll log files based on the file size.

<datePattern value="yyyyMMdd"/>

To change the rolling period we need to set the datePattern property. It would be used if we adopted a rollingStyle based on Date instead of Size.

<maxSizeRollBackups value="10"/>
<
maximumFileSize value="10MB"
/>

Up to 10 old files of 10MB each will be kept. These rolled files will be named: LogNHibernate.txt.1, LogNHibernate.txt.2, LogNHibernate.txt.3, etc...

<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%date - %message%newline"
/>
</
layout
>

A layout enables us to customize the output format. This is accomplished by associating a layout with an appender.

The PatternLayout, lets the user specify the output format according to conversion patterns similar to the C language printf function.

<logger name="NHibernateLogger" additivity="false">

This is the logger and its additivity property controls appender accumulation, that is, how the logs are printed in the hierarchy of loggers.

For example, the output of a log statement of logger NHibernateLogger will go to all the appenders in NHibernateLogger and its ancestors. This is the meaning of the term "appender additivity".

<level value="DEBUG"/>

The level property controls the amount of information you want to be written to the log.

<appender-ref ref="NHibernateRollingFileAppender"/>

The property appender-ref specifies what appenders this logger uses.

That’s it! :-)

With this basic configuration you can start you search for points where NHibernate is generating to much queries where it shouldn’t.

I’m currently working on a performance branch where I’m learning how to deal with NHibernate lazy configuration.

This process of logging the SQL generated by NHibernate plays a great role when one is solving the bottlenecks involved in performance implications.

Just one note: keep in mind that the process of logging is by itself an onerous one. The amount of data that gets written by NHibernate is expressive and depending on the level of information you set inside the logger, the file size will grow very fast.

Hope this helps the fellow developers.

References

log4net homepage
http://logging.apache.org/log4net

log4net introduction
http://logging.apache.org/log4net/release/manual/introduction.html

log4net configuration examples
http://logging.apache.org/log4net/release/config-examples.html

Regex engine in C# - matching strings

To close this series of posts, today I’m going to match some input strings using the regex (l|e)*n?(i|e)el* that we’ve been using since the beginning.

To match the strings I’ll make use of the DFA we constructed in the last post titled Regex engine in C# - the DFA.

These are the 20 strings I’ll be matching:

eee, eeeil, eel, ennil, ie, leie, lele, leleel, lelel, lelenil, leliel, leniel, llnel, ln, lnel, lniel, nelll, niel, nil and nll.

In the DFA class we have a method called Simulate which I show bellow:

public string Simulate(string @in)
{
  state currentState = start;

  CharEnumerator i = @in.GetEnumerator();

  while(i.MoveNext())
  {
    KeyValuePair<state, input> transition = new KeyValuePair<state, input>(currentState, i.Current);

    if(!transTable.ContainsKey(transition))
      return "Rejected";

    currentState = transTable[transition];
  }

  if(final.Contains(currentState))
    return "Accepted";
  else
    return "Rejected";
}

The Simulate method takes as input a string to be matched and returns a string that signalizes success or failing when matching such a string.

To test it I’ll match the string “leniel” which by the way is my own name. :-)

So, what the Simulate method do?

It starts by assigning the start state 0 to the variable currentState.

Next we get a charEnumerator that is used in the while block to move letter by letter (input symbol by input symbol) till we get to the end of the string.

We declare a KeyValuePair<state, input> designated transition that has as the key the currentState and as the value the current input symbol we’re simulating.

We check to see if the DFA’s transition table contains such a transition, that is, if there’s a valid path from that state with that input symbol to another state. If it doesn’t, we reject the string we’re matching, otherwise we make the currentState variable receive the next state, that is, the state appointed by the transition we’ve just checked.

The process inside the while block goes on until we reach the last input symbol taken from the string we’re matching, in this case, the last “l” letter.

After getting out of the while block we make a final check to see if the state we achieved is part of the DFA’s set of final states. If it is, we accept the string, otherwise, we reject it.

This is an x-ray from the variables’ value when in the first iteration of the while block:

Regex Parser Matching the string "leniel"

As you can see in the transition table, from start state (currentState) “0” with the fist input symbol “l” we can go to state “3”.

The following table shows the result obtained while testing the strings mentioned above:


Accepted Rejected

eee

eeeil

eel

ennil

ie

lele

leie

lelel

leleel

lelenil

leliel

llnel

leniel

ln

lniel

lnel

niel

nelll

nil

nll

To make sure it’s correct, debug each one of these strings visually looking at the DFA’s graph representation shown below. Starting at state 0 we must end in one of the final states {7, 8, 9, 10}.

DFA for the regex (l|e)*n?(i|e)el*

The Regex Engine executable
The Regex Engine presented in this series of posts is a C# Console Application. As such it was written in a way that its command line arguments must be entered in the following form:

RegularExpressionEngine "(l|e)*n?(i|e)el*" leniel

Where:

RegularExpressionEngine ->  the name of the executable .exe file (the program itself)

"(l|e)*n?(i|e)el*" –> between the double quotes we pass the regex

leniel –> the string we want to match in the regex

Using Microsoft Visual Studio C# 2008 we can set the command line arguments using the Debug properties page of the project like the following picture:

Regex Expression Engine Debug Properties Page

To get to the Debug page, right click in the solution inside Solution Explorer as shown in the following picture:

Regex Expression Engine Solution Properties

After setting the command line arguments, hit F5 and you’re ready to go.

The Regex Engine source code
You can get the complete code (Microsoft Visual C# 2008 Console application) and executable at:

http://leniel.googlepages.com/RegularExpressionEngine.rar

To try out the code you can use the free Microsoft Visual C# 2008 Express Edition that you can get at: http://www.microsoft.com/express/vcsharp

Updated on 5/12/2009 10:06:00 PM

As I finished writing the posts, here goes the list that points to them:

Regular Expression Engine in C# (the Story)
Regex engine in C# - the Regex Parser
Regex engine in C# - the NFA
Regex engine in C# - the DFA

Source code: https://github.com/leniel/RegexEngine

References
The following links can help you when dealing with regexes:

Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regex Patterns
http://www.regular-expressions.info

Regular Expression Library (great site with lots of regexes and an excellent regex tester)
http://regexlib.com/Default.aspx
http://regexlib.com/RETester.aspx

Testing Windows 7 beta and RC versions

Windows is the piece of software that underlies everything on my computer so that I thought it would be a great opportunity to write about it since we’re on the verge of its new version, namely, Windows 7.

I’ve been testing Windows 7 for the last two months. It’s been my operating system since then.

Firstly I got the 32-bit Beta release on March, 15th and today I’m downloading the 64-bit RC (release candidate) version. I plan to install it this weekend and as soon as I have it installed and configured I’ll update this post.

What I have to write about this operating system? Numbered from 1 to 7 in an unordered relevance fashion…

1 - It is fast.
2 - I risk to say it is faster than Windows XP.
3 - It is beautiful! Take a look at the new taskbar.
4 - It simplifies a lot of tasks.
5 - It has more native programs.
6 - Previous Windows’s native programs got a refresh and were optimized.
7 - It adds more security points to your day to day tasks.

My Windows 7 Beta taskbar:

Windows 7 Beta Taskbar

I moved directly from Windows XP to Windows 7. I didn’t use Windows Vista because it was too bloated for my computer in the beginning of 2006 when I also tested it in the beta period. At that time I had an AMD Athlon XP 2400+ with 512 MB RAM memory and an onboard video card which didn’t allow me to get the so famous Aero interface. I didn’t have a plan to upgrade my hardware. That was the big impediment. I think that Windows Vista arrived in a time in which the majority of computers didn’t have a proper hardware configuration yet.

Now the landscape is different. Windows 7 appears in a time that it’s much cheaper to buy a 2-core computer with GBs of RAM memory; Computer prices went down during the past years even here in Brazil where hardware prices double if compared with US $. Today $ 1 = R$ 2.07. This price is still attractive. Believe it or not! :)

The following picture shows my current computer configuration:

 Windows 7 System configuration

This is the Windows Experience Index score I got - a low score given the fact that I still use an onboard video card and that the score is determined by the lowest subscore:

 Windows 7 Windows Experience Index

Even with a 3.2 score I have the windows Aero enabled. Take a look at the Windows Experience Index: An In-Depth Look post that describes the score levels and what system’s features are enabled or disabled based on them.

From the Engineering the Windows 7 “Windows Experience Index” post we have the following:

The Vista-era general guidelines for systems in the 1.0, 2.0, 3.0, 4.0 and 5.0 ranges still apply to Windows 7. But, Windows 7 has added levels 6.0 and 7.0; meaning 7.9 is the maximum score possible.

While using Windows 7 beta I installed all the software I work with as the Microsoft Visual Studio 2008 Professional, Oracle Database, PL/SQL Developer, TortoiseSVN, etc - a typical developer box.

Windows 7 didn’t crash and was a well behaved operating system during this beta period in which I did everything I used to do in Windows XP.

The only hardware problem I had was with a Creative Audigy sound card (model number SB0090). There was no driver available for Windows 7. So I had to install a driver that wasn’t full compatible. The sound didn’t sound as good as it should be. I experienced a lot of noise.
Creative Labs has a Windows 7 - Driver Availability Chart in case you’re interested.

All in all you’ll get a great experience with such a robust operating system. After all Windows is ubiquitous and the more you can do with it the better you get at work.

Updated on 6/6/2009 05:48:00 PM

I’ve installed Windows 7 64-bit version but I didn’t use it because for the sake of my work I thought Win 7 wouldn’t play as expected.

Firstly we should consider that Windows 7 64-bit has a default Program Files (x86) folder where it puts all applications that are made to run on a 32-bit operating system. This particularly broke my way because the software that I develop rely on an Oracle database connection. There is a known bug with Oracle that prevents an application hosted in a folder that has parenthesis in its name to access an Oracle database.

I didn’t notice any performance gains while running the 64-bit OS.

Today I finally installed the Windows 7 RC 32-bit. I upgraded from the 32-bit Beta version to the RC version. The upgrade is only possible from 32-bit to 32-bit versions and not from a 64-bit Beta to 32-bit RC and vice versa.

The upgrade took 1 hour and 40 minutes to complete and ran flawlessly. You may ask why this took so long? This is because in a upgrade it is necessary to copy a lot of files from the old OS to the new one.

I downloaded the ISO file and extracted its content to a temp folder. I changed the file cversion.ini as described in the post Delivering a quality upgrade experience from the Engineering Windows 7 blog so that I could go through the upgrade. I then clicked setup.exe while running the Windows 7 beta version and the installation started.

I got the following report:

Upgrading Windows will affect the following devices and/or programs:

          These programs might not work properly after the upgrade. We recommend uninstalling these programs before upgrading. Cancel the upgrade, open Control Panel, and search for "uninstall a program". (Note: Programs marked as * can be safely reinstalled after the upgrade.)
          • Microsoft SQL Server 2008
          • Microsoft SQL Server 2005

        I then decided to ignore the report and proceeded with the installation.

        Windows 7 - Upgrading from 32-bit Beta to RC

        The above image shows the early stages of the upgrade. In fact there were more than 400000 files to be gathered. I already had an extensive list of apps installed on the beta OS as for example Microsoft Office 2007 that in itself is a big suite of apps.

        Windows 7 then restarted sometimes and installed the new OS bits that were refined from Beta to RC. The last step was to transfer files, settings and programs to the new OS. It transferred a total of 524490 files.

        My Windows 7 RC taskbar:

        Windows 7 RC Taskbar

        From this what I have to say now that I’m using the RC to write this blog post using Windows Live Writer is that the upgrade was successful. I didn’t need to reinstall anything (including Windows Live Writer). Everything is the way they were before installing the RC version.

        I had a pleasant experience while upgrading from Beta to RC.

        Congratulations to the Microsoft Windows 7 Team for providing this must have feature, that is, making upgrades possible! :-)

        A last note: I described above the problem I had while configuring my Creative Audigy sound card in Windows 7 Beta. With the RC version the Creative Audigy soundcard is functioning as expected. One of the first things Windows 7 did was to install the sound card driver!

        Updated on 2/22/2010

        On February, 17 I made the change to Windows 7 RTM version. After almost a year of constant use the RC version proved to be really stable.

        Now let’s use Windows 7 till Windows 8 comes.

        References
        Engineering Windows 7 blog
        http://blogs.msdn.com/e7

        Windows 7 Team blog
        http://windowsteamblog.com/blogs/windows7

        Microsoft official Windows 7 site
        http://www.microsoft.com/windows/windows-7/default.aspx

        Windows 7 @ Paul Thurrott's SuperSite for Windows
        http://www.winsupersite.com/win7

        Windows 7 article at Wikipedia
        http://en.wikipedia.org/wiki/Windows_7

        List of features new to Windows 7
        http://en.wikipedia.org/wiki/Features_new_to_Windows_7

        Regex engine in C# - the DFA

        This time, we’ll take a look at the DFA’s class and its helper class called SubsetMachine.

        To understand what’s a DFA, refer to the first post in this series called Regex engine in C# - the Regex Parser.

        In the Regex engine in C# - the NFA post we ended with an NFA.

        Now we’re going to build a DFA based on such NFA.

        Remember that the main difference between a DFA and an NFA is that a DFA doesn’t have epsilon (ε) transitions that represent "nothing" or "no input" between states.

        As described in the section DFA versus NFA in the introduction of this series of posts, it may be shown that a DFA is equivalent to an NFA, in that, for any given NFA, one may construct an equivalent DFA, and vice-versa: this is the powerset construction or subset construction.

        So, let’s get our hands dirty with some code.

        Below I present the DFA class:

        //
        //  Regular Expression Engine C# Sample Application
        //  2006, by Leniel Braz de Oliveira Macaferi & Wellington Magalhães Leite.
        //
        //  UBM's Computer Engineering - 7th term [http://www.ubm.br/]
        //  
        //  This program sample was developed and turned in as a term paper for Lab. of
        //  Compilers Construction. It was based on the source code provided by Eli Bendersky
        //  [http://eli.thegreenplace.net/] and is provided "as is" without warranty.
        //
        
        using System;
        using SCG = System.Collections.Generic;
        using C5;
        
        using state = System.Int32;
        using input = System.Char;
        
        namespace RegularExpressionEngine
        {
          /// <summary>
          /// Implements a deterministic finite automata (DFA)
          /// </summary>
          class DFA
          {
            // Start state
            public state start;
            // Set of final states
            public Set<state> final;
            // Transition table
            public SCG.SortedList<KeyValuePair<state, input>, state> transTable;
        
            public DFA()
            {
              final = new Set<state>();
        
              transTable = new SCG.SortedList<KeyValuePair<state, input>, state>(new Comparer());
            }
        
            public string Simulate(string @in)
            {
              state curState = start;
        
              CharEnumerator i = @in.GetEnumerator();
        
              while(i.MoveNext())
              {
                KeyValuePair<state, input> transition = new KeyValuePair<state, input>(curState, i.Current);
        
                if(!transTable.ContainsKey(transition))
                  return "Rejected";
        
                curState = transTable[transition];
              }
        
              if(final.Contains(curState))
                return "Accepted";
              else
                return "Rejected";
            }
        
            public void Show()
            {
              Console.Write("DFA start state: {0}\n", start);
              Console.Write("DFA final state(s): ");
        
              SCG.IEnumerator<state> iE = final.GetEnumerator();
        
              while(iE.MoveNext())
                Console.Write(iE.Current + " ");
        
              Console.Write("\n\n");
        
              foreach(SCG.KeyValuePair<KeyValuePair<state, input>, state> kvp in transTable)
                Console.Write("Trans[{0}, {1}] = {2}\n", kvp.Key.Key, kvp.Key.Value, kvp.Value);
            }
          }
        
          /// <summary>
          /// Implements a comparer that suits the transTable SordedList
          /// </summary>
          public class Comparer : SCG.IComparer<KeyValuePair<state, input>>
          {
            public int Compare(KeyValuePair<state, input> transition1, KeyValuePair<state, input> transition2)
            {
              if(transition1.Key == transition2.Key)
                return transition1.Value.CompareTo(transition2.Value);
              else
                return transition1.Key.CompareTo(transition2.Key);
            }
          }
        
        }

        As you see, a DFA has 3 variables: a start state, a set of final states and a transition table that maps transitions between states.

        Below I present the SubsetMachine class that is responsible for the hard work of extracting an equivalent DFA from a given NFA:

        //
        //  Regular Expression Engine C# Sample Application
        //  2006, by Leniel Braz de Oliveira Macaferi & Wellington Magalhães Leite.
        //
        //  UBM's Computer Engineering - 7th term [http://www.ubm.br/]
        //  
        //  This program sample was developed and turned in as a term paper for Lab. of
        //  Compilers Construction. It was based on the source code provided by Eli Bendersky
        //  [http://eli.thegreenplace.net/] and is provided "as is" without warranty.
        //
        
        using System;
        using SCG = System.Collections.Generic;
        using C5;
        
        using state = System.Int32;
        using input = System.Char;
        
        namespace RegularExpressionEngine
        {
          class SubsetMachine
          {
            private static int num = 0;
        
            /// <summary>
            /// Subset machine that employs the powerset construction or subset construction algorithm.
            /// It creates a DFA that recognizes the same language as the given NFA.
            /// </summary>
            public static DFA SubsetConstruct(NFA nfa)
            {
              DFA dfa = new DFA();
        
              // Sets of NFA states which is represented by some DFA state
              Set<Set<state>> markedStates = new Set<Set<state>>();
              Set<Set<state>> unmarkedStates = new Set<Set<state>>();
        
              // Gives a number to each state in the DFA
              HashDictionary<Set<state>, state> dfaStateNum = new HashDictionary<Set<state>, state>();
        
              Set<state> nfaInitial = new Set<state>();
              nfaInitial.Add(nfa.initial);
        
              // Initially, EpsilonClosure(nfa.initial) is the only state in the DFAs states and it's unmarked.
              Set<state> first = EpsilonClosure(nfa, nfaInitial);
              unmarkedStates.Add(first);
        
              // The initial dfa state
              state dfaInitial = GenNewState();
              dfaStateNum[first] = dfaInitial;
              dfa.start = dfaInitial;
        
              while(unmarkedStates.Count != 0)
              {
                // Takes out one unmarked state and posteriorly mark it.
                Set<state> aState = unmarkedStates.Choose();
        
                // Removes from the unmarked set.
                unmarkedStates.Remove(aState);
        
                // Inserts into the marked set.
                markedStates.Add(aState);
        
                // If this state contains the NFA's final state, add it to the DFA's set of
                // final states.
                if(aState.Contains(nfa.final))
                  dfa.final.Add(dfaStateNum[aState]);
        
                SCG.IEnumerator<input> iE = nfa.inputs.GetEnumerator();
        
                // For each input symbol the nfa knows...
                while(iE.MoveNext())
                {
                  // Next state
                  Set<state> next = EpsilonClosure(nfa, nfa.Move(aState, iE.Current));
        
                  // If we haven't examined this state before, add it to the unmarkedStates and make up a new number for it.
                  if(!unmarkedStates.Contains(next) && !markedStates.Contains(next))
                  {
                    unmarkedStates.Add(next);
                    dfaStateNum.Add(next, GenNewState());
                  }
        
                  KeyValuePair<state, input> transition = new KeyValuePair<state, input>();
        transition.Key = dfaStateNum[aState]; transition.Value = iE.Current; dfa.transTable[transition] = dfaStateNum[next]; } } return dfa; } /// <summary> /// Builds the Epsilon closure of states for the given NFA /// </summary> /// <param name="nfa"></param> /// <param name="states"></param> /// <returns></returns> static Set<state> EpsilonClosure(NFA nfa, Set<state> states) { // Push all states onto a stack SCG.Stack<state> uncheckedStack = new SCG.Stack<state>(states); // Initialize EpsilonClosure(states) to states Set<state> epsilonClosure = states; while(uncheckedStack.Count != 0) { // Pop state t, the top element, off the stack state t = uncheckedStack.Pop(); int i = 0; // For each state u with an edge from t to u labeled Epsilon foreach(input input in nfa.transTable[t]) { if(input == (char)NFA.Constants.Epsilon) { state u = Array.IndexOf(nfa.transTable[t], input, i); // If u is not already in epsilonClosure, add it and push it onto stack if(!epsilonClosure.Contains(u)) { epsilonClosure.Add(u); uncheckedStack.Push(u); } } i = i + 1; } } return epsilonClosure; } /// <summary> /// Creates unique state numbers for DFA states /// </summary> /// <returns></returns> private static state GenNewState() { return num++; } } }

        In the first post of this series we see the following line of code:

        DFA dfa = SubsetMachine.SubsetConstruct(nfa);

        The SubsetConstruct method from the SubsetMachine class receives as input an NFA and returns a DFA.

        Inside the SubsetConstruct method we firstly instantiate a new DFA object and then we create two variables markedStates and unmarkedStates that are sets of NFA states which represent a DFA state.

        // Sets of NFA states which is represented by some DFA state
        Set<Set<state>> markedStates = new Set<Set<state>>();
        Set<Set<state>> unmarkedStates = new Set<Set<state>>();

        From this we see that a DFA state can represent a set of NFA states. Take a look at the introductory post and see Figure 2. It shows two DFA states that represent sets of NFA states, in this particular case the DFA final states represent the NFA states {s2, s3} and {s5, s6}.

        The HashDictionary helps us to give a name (to number) each DFA state.

        // Gives a number to each state in the DFA
        HashDictionary<Set<state>, state> dfaStateNum = new HashDictionary<Set<state>, state>();
        We declare a variable called nfaInitial that is a set of states. It receives the initial NFA state:
        Set<state> nfaInitial = new Set<state>();
        nfaInitial.Add(nfa.initial);
        We’ll start using the EpsilonClosure function. 
        // Initially, EpsilonClosure(nfa.initial) is the only state in the DFAs states and it's unmarked.
        Set<state> first = EpsilonClosure(nfa, nfaInitial);
        The EpsilonClosure function receives as parameters the NFA and its initial state and returns a set of states. Take a look at the method signature:
        static Set<state> EpsilonClosure(NFA nfa, Set<state> states)
        So, what does it do? You may ask. To answer this question let’s debug this first method call:
        From the NFA transition table presented in Figure 2 and from the transition graph presented in Figure 3 in the second post of this series we can see how many transitions are represented by eps transitions.
        The first time we enter into this function we’ll get as a return value a set of states that contains all the states that are reachable with an eps transition from the start state 0.
        EpsilonClosureFunction
        Figure 1 - States reachable by an eps transition from start state 0.

        For the sake of comparison I’ll show the NFA’s graph representation for the regex (l|e)*n?(i|e)el* that we’re studying since the beginning of this series.

        NFA for the Regex (l|e)*n?(i|e)el*

        Figure 2 - NFA’s graph representation for the regex (l|e)*n?(i|e)el*

        If you pay close attention you’ll see that the order the regex parser found the states is the order we visually debug the code looking at the graph above.

        With such states found we move next adding this DFA state into the variable unmarkedStates.

        We then use a function called GetNewState that is responsible for generating a number that uniquely identifies each state of the DFA:

        // The initial dfa state
        state dfaInitial = GenNewState();

        When we pass to the next line of code we add to the dfaStateNum dictionary a key that is the set of states returned by the EpsilonClosure function and a value that is the name of the initial state of the DFA.

        dfaStateNum[first] = dfaInitial;
        We make the initial state of the DFA be the dfaInitial value we just got. 
        dfa.start = dfaInitial;

        Next we enter in the first while keyword. In this while we basically extract one of the unmarkedStates and add the same to the markedStates set. This has the meaning of telling that we already checked such state.

        // Takes out one unmarked state and posteriorly mark it.
        Set<state> aState = unmarkedStates.Choose();
        
        // Removes from the unmarked set.
        unmarkedStates.Remove(aState);
        
        // Inserts into the marked set.
        markedStates.Add(aState);

        In the next line of code (one of the most interesting parts of the whole code) we check to see if this current DFA state (remember that it is a set of states) we’re on contains the NFA final state, if it holds true, we add it to the DFA’s set of final states:

        // If this state contains the NFA's final state, add it to the DFA's set of final states.
        if(aState.Contains(nfa.final))
          dfa.final.Add(dfaStateNum[aState]);

        Now it’s time to check against the NFA’s input symbols. To accomplish this we declare an enumerator of type state that does the job of moving through each of the input symbols in the next while code block:

        SCG.IEnumerator<input> iE = nfa.inputs.GetEnumerator();
        
        // For each input symbol the nfa knows...
        while(iE.MoveNext())
        { . . .
        Now it’s time to create the next DFA state. We do this by declaring a new set of states and we call the EpsilonClosure function again to fill this state, but this time we pass the EpsilonClosure function a different second parameter.
        // Next state
        Set<state> next = EpsilonClosure(nfa, nfa.Move(aState, iE.Current));

        Let’s go deeper to take a look at this second parameter.

        As you see we call the function Move that is part of the NFA class. This function receives as parameters a set of states and an input symbol to be checked against. It returns a set of states.

        What the move function does is: foreach state in the set of states passed as the first parameter we check each transition present in the NFA’s transition table from this state to another state with the input symbol passed as the second parameter.

        So, the first time we pass we get the following output from the Move function:

        NFAMoveFunction

        Figure 3 - Result from the NFA’s Move function the 1st time it’s called

        If we look at Figure 2  we can assert that from the states present in the first state of the DFA (see Figure 1) we can move to states {5, 16} with the first NFA input that is equal to ‘e’.

        With the above result taken from the Move function we’re ready to go the EpsilonClosure function for the second time to create the 2nd DFA state in the SubsetMachine class. This second time we get the following result from EpsilonClosure function:

        EpsilonClosureFunction2

        Figure 4 - Result from the EpsilonClosure function the 2nd time it’s called

        Now, if you pay close attention, we can assert that starting at the states {5, 16} we can move with an eps-transition to the states shown above. Remember that the states we pass to the EpsilonClosure function are themselves included in the result returned by the function.

        Now that we have created the 2nd DFA state we check to see if it wasn’t examined yet and if it holds true we add it to the unmarkedStates variable and give a new name to this state numbering it with the GenNewState function.

        // If we haven't examined this state before, add it to the unmarkedStates and make up a new number for it.
        if(!unmarkedStates.Contains(next) && !markedStates.Contains(next))
        {
          unmarkedStates.Add(next);
          dfaStateNum.Add(next, GenNewState());
        }

        Now the best part of it. :)

        We create a new transition that has as key the number of the DFA state we’re checking and as the value the current input symbol we’re after.

        KeyValuePair<state, input> transition = new KeyValuePair<state, input>();
        transition.Key = dfaStateNum[aState]; transition.Value = iE.Current;

        We then add this transition to the DFA’s transition table:

        DFATransitionTable

        Figure 5 - DFA’s transition table

        This has the following meaning: from state 0 with input ‘e’ go to state 1!

        These are the subsequent values we get for the first unmarkedState we’re checking:

        With input ‘i’ we can go to state { 14 } from which with an eps transition we can go to state { 17 }.

        With input ‘l’ we can go to state { 3 } from which with an eps transition we can go to states { 4, 13, 8, 3, 12, 7, 2, 11, 6, 1, 15, 10 }.

        With input ‘n’ we can go to state { 9 } from which with an eps transition we can go to states { 12, 9, 13, 15 }.

        A point that deserves consideration is that each time you run the regex parser it’s not guaranteed that the numbers that identify the DFA states will remain the same.

        I won’t continue debugging because it would consume a lot of space in this blog post.

        I think that with the above explanation it’s easy to get the point.

        In short we’ll repeat the above steps for each unmarked state that hasn’t been checked yet working with it against each input symbol.

        For the regex (l|e)*n?(i|e)el* in one of the times I ran the code, I got the following DFA’s transition table:

        DFA start state: 0
        DFA final state(s): 7 8 9 10

        Trans[0, e] = 1
        Trans[0, i] = 2
        Trans[0, l] = 3
        Trans[0, n] = 4
        Trans[1, e] = 7
        Trans[1, i] = 2
        Trans[1, l] = 3
        Trans[1, n] = 4
        Trans[2, e] = 8
        Trans[2, i] = 6
        Trans[2, l] = 6
        Trans[2, n] = 6
        Trans[3, e] = 1
        Trans[3, i] = 2
        Trans[3, l] = 3
        Trans[3, n] = 4
        Trans[4, e] = 5
        Trans[4, i] = 2
        Trans[4, l] = 6
        Trans[4, n] = 6
        Trans[5, e] = 8
        Trans[5, i] = 6
        Trans[5, l] = 6
        Trans[5, n] = 6
        Trans[6, e] = 6
        Trans[6, i] = 6
        Trans[6, l] = 6
        Trans[6, n] = 6
        Trans[7, e] = 7
        Trans[7, i] = 2
        Trans[7, l] = 10
        Trans[7, n] = 4
        Trans[8, e] = 6
        Trans[8, i] = 6
        Trans[8, l] = 9
        Trans[8, n] = 6
        Trans[9, e] = 6
        Trans[9, i] = 6
        Trans[9, l] = 9
        Trans[9, n] = 6
        Trans[10, e] = 1
        Trans[10, i] = 2
        Trans[10, l] = 10
        Trans[10, n] = 4

        Figure 6 - DFA’s transition table for the regex (l|e)*n?(i|e)el*

        Below is the DFA’s graph representation:

        DFA for the Regex (l|e)*n?(i|e)el*

        Figure 7 - DFA’s graph representation for the regex (l|e)*n?(i|e)el*

        In the next post I’ll simulate some input strings against this DFA to assert its validity.

        See you there!

        Updated on 5/12/2009 09:57:00 PM

        As I finished writing the posts, here goes the list that points to them:

        Regular Expression Engine in C# (the Story)
        Regex engine in C# - the Regex Parser
        Regex engine in C# - the NFA
        Regex engine in C# - matching strings

        Source code: https://github.com/leniel/RegexEngine