A “long time ago”, more precisely 3 years ago, I was studying Automata and Formal Languages which was a Computer Engineering discipline in the 6th semester out of 10 at UBM.
At that time I was amazed by the new things I was learning such as NFA - Nondetermistic Finite Automaton, DFA - Deterministic Finite Automaton, Finite State Machine and Regular Expressions. For more on this, read my last post: Fortran Numerical Constants Recognizer.
For the sake of my development I started searching for programming related material that could put me informed about such amazing computer science constructs.
I then came across a series of articles at GameDev.net called Algorithmic Forays. It’s been written by Eli Bendersky. This guy did a great job putting together the base for a Regular Expression Engine that exercise the concepts of NFA and DFA. His code was presented in C++ and I took the time to learn a bit more about this powerful language too.
After reading Algorithmic Forays from part 1 through part 6 I started thinking about translating the code from C++ to C#. That’d be a great opportunity to grasp new C# constructs and at the same time get in touch with material related to the discipline afore mentioned.
Setting that in my mind I put my hands at work and after struggling here and there I got the code translated. I also thought about writing a Portuguese version of the article to be publicized at my university’s scientific magazine. The idea was to write an adapted version of the article presenting both code (C++ and C#) showing how to achieve the same results using different programming languages data structures.
I’ve sent a message to Eli asking him permission to write a Portuguese version of the article. You can read the e-mail I’ve sent Eli and his reply below:
     
Leniel Macaferi <le@…> 
Algorithmic Forays for a Regex Parser
le@... <le@...> Wed, Jan 3, 2007 at 5:43 AM
To: eli…@…
Cc: Wellington Magalhães Leite <wml@…>
Dear Eli,
My name is Leniel and I'm a student of Computer Engineering here in    
Brazil. I'm currently in the 9th term out of 10.     
You can see my basic profile at     
http://www.codeproject.com/script/profile/whos_who.asp?id=1224713
Today I'm on my vacation and I'm going to come back studying on February.
I'm passionate for anything related to technology and specially    
programming which is my hobby just like yours.
Firstly I'd like to congratulate you by your advanced knowledge of    
such great topics about Computer Science and Computer Engineering like     
the one you discussed in the article Algorithmic Forays - Part 1 to 6.
Well, I've read your biography at http://eli.thegreenplace.net/about/     
and I'm currently seeing your photo album. You're a great example for     
us to follow due your characteristics as liking to study and liking to     
travel the whole world. It's just what I want to do too as a     
professional and as a traveler! You really do very well on this.
Your site is all of good! Think about the entire internet filled with    
sites just like yours. With a complete bunch of interesting     
information! My God it's a dream.
Coming to the real reason I'm writing to you - it's about a project    
that has been proposed to me and a friend of mine called Wellington.     
When we were in the 7th term at the university during our class of     
Laboratory of Compiler Construction studying Lex and Yacc, our teacher     
asked us to prepare an article about the process of creation of an     
engine to parse Regular Expressions. Since that moment we started     
searching the Internet for something that we could use as a base for     
our article. For our lucky, we found your article Algorithmic Forays     
at www.gamedev.net that is amazing for everyone to understand. It's     
simple and direct with implementation examples.     
During the 6th term at the university we had a discipline called     
Automatons and Formal Languages. Was at this period that we learned     
topics regarding Finite State Machines (FSM), Deterministic Finite     
Automatons (DFA), Nondeterministic Finite Automatons (NFA) and Regular     
Expressions. So, from there we know that a DFA + NFA = FSM! We've also     
implemented some code for covering and getting a better understanding     
of how such topics work in reality.
In true, we started our project preparing an article on April, May of    
2006, when I particularly started converting all your Algorithmic     
Foray's C++ code to the C# language that is the one I most like to use     
for writing any kind of code. But due our other tasks, in other words,     
other disciplines, we stopped working on this project. It's important     
to mention that I could finalize translating your C++ code to C# on     
the middle of May 2006.
I've learned such great things working with the code as mastering my    
knowledge with the Generics in C#. I even got one third party DLL     
called C5.dll from http://www.itu.dk/research/c5/ which includes lots     
of new classes to work with sets of data as is necessary when dealing     
with regular expressions.
Yesterday, 1/2, when I woke up I just thought why not to continue    
working on that project about a regex parser. So, I started again with     
new motivations.
I started translating your whole article from English to Portuguese    
and I'd like to ask you if we could use some text and diagrams of your     
article as the base of our article and we would present it with both     
versions of code, C++ and C#.
Our aim is to forward it to be issued in our university's scientific    
magazine and propagate to the ones interested – see     
http://www.ubm.br/ubm/paginas/copep/inicial.php?pagina=apresentacao.php     
(it's in Portuguese - you're learning Spanish and it's a little bit     
alike. I've already publicized one article in this magazine with the     
title: Development and Numeric Simulation of Algorithms for     
Computational Resolution of Ordinary Differential Equations. It was a     
project I developed while I was in the 4th term. The code was     
developed using C++. Further, this article will be available in the     
Internet through an indexing project that's been prepared by a     
university here in Brazil.
We'll certainly cite the article's fonts as you can see in the    
preliminary code I'm going to send you in this e-mail. Your great     
initiative of publicizing the article must be preserved by any mean.     
Haven't you used any other bibliography? Was it developed by just your     
thoughts?
Remember that this code I'm sending you needs a review because I have    
to pass through it to get it all in mind again and see if everything     
is Ok. To run the code you just need to recompile it using a C#     
compiler. I use the Microsoft Visual Studio 2005 Express Edition as     
you mentioned in your home page here     
http://eli.thegreenplace.net/2006/12/03/a-complete-c-development-environment-from-microsoft-free/     
The initial parameter I'm using is set in the command line argument     
inside the project's properties in Debug and is as follow:     
RegularExpressionEngine "(a|b)*abb" aaababb     
It's seems to be working Ok.     
Try it for yourself!
From now on we're just waiting for your reply.
By the way, if you and your wife want to come to visit Brazil you can    
certainly come to my house. I'd like to have you as my guests. Who     
knows, one day I can go to Israel and meet you personally. That would     
be great too. Israel is the apple of God's eye!
I simply think we have the same liking!
It was a pleasure to write to you.
My best regards,    
                         Leniel Braz de Oliveira Macaferi     
                         Volta Redonda, Rio de Janeiro, Brazil
     
Leniel Macaferi <le@…> 
Algorithmic Forays for a Regex Parser
Eli Bendersky <eli@…> Sun, Jan 21, 2007 at 2:03 AM
To: "le@…" <le@…>
Hello Leniel,
First of all, thanks for all the compliments - I'm flattered.    
You can certainly use any part of the articles and the diagrams for your translation. As for my sources, I certainly didn't invent anything - I took all the material from the "Dragon Book" ("Compilers" by Aho, Sethi and Ullman), perhaps with the help of some googling, but as far as I remember, the book was the main source.
Keep being enthusiastic about your studies. That's a good thing :-)
Regards,
Eli
As you can see from the above e-mail, Eli agreed and so it was OK to write a Portuguese version of his articles.
As the semesters passed and new commitments queued I stopped translating the articles and I have just decided this past week that now it’s a good time to come back to it and finish the planned objective of 2 years ago.
While I write the next blog posts I’ll translate the remaining articles to Portuguese. I expect to publish it online here on this blog so that it can help people out there to learn this fascinating subject of Computation Theory.
I’ll follow a different path to explain how the regular expression engine work, that is, from part 1 to part 6 I’ll be showing each division of the engine by means of its representing class in code. I’ll do so, so that I can present each part of the translated code (C++ to C#) in such a way that it gets easier to understand. I’ll use a top down approach, that is, I’ll begin showing the higher level class and then I’ll go deep to its dependencies.
This is the story behind the Regular Expression Engine in C# I’ll be presenting in the next posts, starting with this one.
Hope that you like it as much as I do. This was for sure one of or even the best discipline I’ve had in the Computer Engineering course and one of the more exciting things to write about.
Updated on 5/12/2009 09:41:00 PM
As I finished writing the series of posts, here goes the list that points to them:
Regex engine in C# - the Regex Parser     
Regex engine in C# - the NFA     
Regex engine in C# - the DFA     
Regex engine in C# - matching strings
Source code: https://github.com/leniel/RegexEngine