Source code Indenter-Capitalizer

Following the coursework related to the Compilers Construction discipline I attended during the Computer Engineering course, I was asked to indent and capitalize the reserved words (keywords) of a source code file. More specifically, to do this work I should use the Syntactic Analyzer built with Flex and YACC that was created in a previous coursework task.

The source code file in question is the one being analyzed by the syntactic analyzer. This way at the same time it analyses the syntactic structure of the file it also indents and capitalizes the keywords.

The following is the content of the syntactic analyzer file named sinan.yacc:

%{
   #include <stdio.h>
   #include <stdlib.h>

   int c;
%}

%token PROGRAM
%token ID
%token SEMIC
%token DOT
%token VAR
%token COLON
%token INTEGER
%token REAL
%token RCONS
%token BOOL
%token OCBRA
%token CCBRA
%token IF
%token THEN
%token ELSE
%token WHILE
%token DO
%token READ
%token OPAR
%token CPAR
%token WRITE
%token COMMA
%token STRING
%token ATRIB
%token RELOP
%token ADDOP
%token MULTOP
%token NEGOP
%token CONS
%token TRUE
%token FALSE

%%

prog : PROGRAM {printf("PROGRAM "); c = 0;}

       ID {printf("%s", yytext);}

       SEMIC {printf(";\n\n");}

       decls

       compcmd

       DOT {printf(".");}
       {
         printf("\n Syntactical Analisys done without erros!\n");

         return 0;
       }
     ;

decls : VAR {printf("VAR ");} decl_list

      ;

decl_list : decl_list decl_type
            decl_type
          ;

decl_type : id_list COLON {printf(":");} type SEMIC {printf(";\n");}
          ;

id_list : id_list COMMA {printf(", ");} ID {printf("%s", yytext);}
          ID {printf("%s", yytext);}
        ;

type : INTEGER {printf(" INTEGER");}
       REAL {printf(" REAL");}
       BOOL {printf(" BOOL");}
     ;

compcmd : OCBRA {int i; for(i = 0; i < c; i++)printf(" "); printf("{\n"); c = c + 2;} cmd_list CCBRA {printf("\n"); int i; for(i = 2; i < c; i++)printf(" "); printf("}"); c = c - 2;}
        ;

cmd_list : cmd_list SEMIC {printf(";\n\n");} cmd
           cmd
         ;

cmd : {int i; for(i = 0; i < c; i++)printf(" ");} If_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} While_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Read_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Write_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Atrib_cmd
      compcmd
    ;

If_cmd : IF {printf("IF ");} exp THEN {printf(" THEN\n");} cmd Else_cmd
       ;

Else_cmd : ELSE {printf("\n"); int i; for(i = 0; i < c; i++)printf(" "); printf("ELSE\n"); c = c + 2;} cmd {c = c - 2;}

         ;

While_cmd : WHILE {printf("WHILE ");} exp DO {printf(" DO\n");} cmd
          ;

Read_cmd : READ {printf("READ");} OPAR {printf("(");} id_list CPAR {printf(")");}
         ;

Write_cmd : WRITE {printf("WRITE");} OPAR {printf("(");} w_list CPAR {printf(")");}
          ;

w_list : w_list COMMA {printf(", ");} w_elem
         w_elem
       ;
w_elem : exp
         STRING {printf("%s", yytext);}
       ;

Atrib_cmd : ID {printf("%s ", yytext);} ATRIB {printf("= ");} exp
          ;

exp : simple_exp
      simple_exp RELOP {printf(" %s ", yytext);} simple_exp
    ;

simple_exp : simple_exp ADDOP {printf(" %s ", yytext);} term
             term
           ;

term : term MULTOP {printf(" %s ", yytext);} fac
       fac
     ;

fac : fac NEGOP {printf(" %s", yytext);}
      CONS {printf(" %s", yytext);}
      RCONS {printf(" %s", yytext);}
      OPAR exp CPAR
      TRUE {printf("TRUE ");}
      FALSE {printf("FALSE ");}
      ID {printf("%s", yytext);}
    ;


%%

#include "lex.yy.c"

As you can see there is an integer variable named c right beneath the #include section. This variable is incremented and decremented according to the section of code being analyzed (parsed). Such variable is then used inside the for loops. According to its current value blank spaces are written on the screen so that the next token parsed is printed on the right position (indented).

Each keyword parsed is capitalized and printed on the screen through a simple printf command.

Let's run a simple test case with the following source code file named test.txt. The code is intentionally not indented and it doesn't do much. It's just for a testing purpose.

program factorial;

var n, fact, i: integer;
{
read(n);

fact = 1;

i = 1;

while i <= n do
{
fact = fact * i;

i = i + 1
};

if i >= fact then
{
i = fact;

fact = fact + 1
}
else
i = fact + 1;



if i < fact then
{
i = fact;

fact = fact + 1
};

write("The factorial of ", n, " is: ", fact)
}.

In blue are the keywords, so the output should present them in CAPITALIZED letters. The blocks of code should also be presented with indentation according to the logic specified above. I use 2 white spaces to indent the blocks of code. That's why I use c = c + 2 (to increment) and c = c - 2 (to decrement) the c variable is responsible for controlling the indentation.

To run the test case it's necessary to build the syntactic analyzer. I won't show here all the steps involved since it's already done in the paper described in the post Syntactic Analyzer built with Flex and YACC. So I'll just compile the sinan.yacc file since it's the only file that was modified to accomplish this task. The other necessary files to generate the executable file - such as the lexical analyzer ones - are included in the download package at the end of this post.

For a better understanding of the commands used in this post and to set up the development environment I recommend that you read at least section 3.3 of the paper Syntactic Analyzer built with Flex and YACC. That said, let's do the job. :-)

To run this test case, follow the steps listed bellow:

  1. Create a folder called IndentCapCode on the root directory C:\, which will contain all the files created henceforward.
  2. Open a command prompt and type: path=C:\MinGW\bin;%PATH%. I consider that the MinGW installation was done in the folder C:\MinGW. Change it accordingly. After completing this step the tools to build the syntactic analyzer will be available in the command prompt.
  3. Generate the file y.tab.c in the command prompt with following command: yacc sinan.yacc.
  4. Compile the files with GCC in the command prompt with the following command: gcc y.tab.c yyerror.c main.c -osinan -lfl.

The result file for the syntactic analyzer is sinan.exe. To use it just type sinan < test.txt. The file test.txt contains the source code to be analyzed by the syntactic analyzer.

You can see the commands listed above in action through the following screenshot:

In a next post related to compilers construction I'll show the implementation of a semantic analyzer. This implementation was another coursework task. I used the same principle shown here to indent the code and even to colorize the keywords. But this is for another day.

You can get the .ZIP package that contains the files used in this post: http://leniel.googlepages.com/CodeIndenterCapitalizerFlexYACC.zip

Note: if you want, you can use a batch file (.bat) to automate the steps listed above. A .bat file named ini.bat is included in the .ZIP package. For more information about batch files, read my previous post Programming-Constructing a Batch file.