Friday, February 17, 2006

Explaining Lex and Yacc

Note: Change {} after includes to pointy brackets.
1)

What exactly are you doing when you process something with lex?

Well, lex takes your lex file, say example2.l and generates a file in the C language. The name of this file is lex.yy.c.

In class, I spoke about Finite State Machines, and how they can recognize regular languages. lex.yy.c is an implementation of a Finite State Machine based on your lex file. The regular expressions are inputs, and the actions or printfs are the outputs.

Thus, let us say we have example2.l:

%{
#include {stdio.h}
%}

%%
[0123456789]+ printf("NUMBER\n");
[a-zA-Z][a-zA-Z0-9]* printf("WORD\n");
%%
The stuff between %{ and %} will be included in the C file, which is important since we are using printf. The stuff between %% and %% are regular expressions together with the action to take after successfully recognizing that expression.

At the command line, we type:
lex example2.l

This generates the file lex.yy.c.

Pseudocode to explain what this file contains (obtained from this website):
        state= 0; get next input character
while (not end of input) {
depending on current state and input character
match: /* input expected */
calculate new state; get next input character
accept: /* current pattern completely matched */
state= 0; perform action corresponding to pattern
error: /* input unexpected */
state= 0; echo 1st character input after last accept or error;
reset input to 2nd character input after last accept or error;
}
Thus, we start at the start state, read input. So long as we are matching, we calculate a new state based on current state and input character. We do this until we enter an accept state. If we do, we go to the start state once again, so we can match the next string (=token), . If we go to an error state along the way, we say so and try matching again.

We need not go into detail here how exactly lex.yy.c implements this Finite State Machine in C. You can try examining the code if you want.

Now, if we want to run this Finite State Machine in lex.yy.c, we cannot, because it is written in C, not in machine language. Thus, we must compile the file.

There are two ways to compile it. We can incorporate the lex.yy.c with other files in a C project, and call the various functions and access the variables it exposes. We would need to do this because lex.yy.c does not have its own main function. Alternatively, we could specify that libl provide us with an appropriate main(). We do the latter.

Thus, at the command line, we type:
cc lex.yy.c -o example2 -ll

This -ll gets the main() function from libl. -o states that the output file should be what we specify, namely example2.

We can then run the executable, example2.

2) What exactly does yacc do?
yacc
takes a yacc file which contains BNF and generates a C file which implements a Push Down Automaton (PDA) which accepts the context free language specified in the yacc file.

We use yacc together with lex.
Our lex file will recognize and inform us about tokens, while our yacc file will base itself upon those tokens and recognize a context free grammar.

Beforehand, we used the -ll option with the cc command, so that we were provided with a main() function.

Now, however, we want to compile more than just the lex.yy.c file. We want to incorporate it with C code generated by yacc. Thus, we will no longer use the -ll option. Instead, we will make sure that our yacc file has a main() function defined.

3)
Let us consider example4, which is the thermostat example.

Create a lex file, example4.l. The contents of this file should be:
%{
#include {stdio.h}
#include "y.tab.h"
%}
%%
[0-9]+ return NUMBER;
heat return TOKHEAT;
on|off return STATE;
target return TOKTARGET;
temperature return TOKTEMPERATURE;
\n /* ignore end of line */;
[ \t]+ /* ignore whitespace */;
%%
Most of this should be familiar. We are matching tabs, newlines, the string "on" or "off", etcetera.

One thing is different, though. Rather that just having #include {stdio.h}, we have an additional line that we are explicitly marking to be included in the generated C file, lex.yy.c. That line is:
#include "y.tab.h"
y.tab.h is a header file needed in association with the C file generated by yacc, namely y.tab.c.

We now need a yacc file. The tutorial does a bad job here by separating parts of the file, and not . Basically, we need a single file that does three things:
  1. Provide literal code that will include necessary .h files and provide needed functions such as main(). This part will be between %{ and %}
  2. Give a definition of the tokens to be used in our grammar. This line will start %token.
  3. Lay out the context free grammar, with rules, as well as actions to take (such as printf) after the rule has been satisfied. This should be embedded within %% and %% tags, but the tutorial omitted it, and thus you may have had a compile time error.
The contents of the file example4.y should be:
%{
#include
#include

void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}

int yywrap()
{
return 1;
}

main()
{
yyparse();
}

%}

%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE

%%
commands: /* empty */
| commands command
;

command:
heat_switch
|
target_set
;

heat_switch:
TOKHEAT STATE
{
printf("\tHeat turned on or off\n");
}
;

target_set:
TOKTARGET TOKTEMPERATURE NUMBER
{
printf("\tTemperature set\n");
}
;
%%
Save this file as example4.y. Then run yacc on it by typing at the command line:

yacc example4.y

This will generate two files, y.tab.h and y.tab.c.

Now, we have our three files:
lex.yy.c
y.tab.c
y.tab.h

We need to compile them. Remember we are not using the -ll option since we are providing our own main() function.
cc lex.yy.c y.tab.c -o example4 
Then run example4.



1 comment:

Unknown said...

It is giving me a fatal error
oslab21@oslab21-OptiPlex-3020:~/Desktop/PK$ lex th.l
oslab21@oslab21-OptiPlex-3020:~/Desktop/PK$ yacc -d th.y
oslab21@oslab21-OptiPlex-3020:~/Desktop/PK$ gcc lex.yy.c y.tab.c -o th
th.l:2:20: fatal error: y.tab.h: No such file or directory
compilation terminated.

waht is the solution for this plz help me