Contents
ToggleGloushkov's algorithm
Glushkov's algorithm constructs a automaton non-deterministic accepting the set of words described by a regular expression. The number of automaton states is proportional to the size of the expression.
The main steps of Gloushkov's algorithm starting from a rational expression E are as follows:
- Transform the expression E into a new expression E' where each letter appears at most once in E'.
- Build an automaton A' accepting the set of words described by E'
- Transform A' into an automaton A accepting the set of words described by E by replacing the unique symbols of E' by those of origin
Step 1: separate letters
The first step in Gloushkov's algorithm is to make all letters distinct in the expression. Each occurrence of the same letter is numbered. The expression E = (ab + b)* becomes, for example, E' = (ab0 + b1)*.
Let's take a longer example: E = (a + b)*(abb + ε). After renaming we have the following expression: (x1 + x2) * (x3x4x5 + ε).
We must now define the Premier group containing the symbols that can start a word. Last the group containing the symbols that can end the word. And a following table defining the symbols that can be suffixed by another symbol.
In this example we have:
- Prime = {x1, x2, x3}
- Last = {x1, x2, x5}
- Table of the following:
Step 2: construction of the automaton
Let E' be a regular expression where all the letters are different. We build an automaton A' as follows.
- The set of states is Q = {i} ∪ { a : a appears in E'}
- The initial state is state i
- The final states are Last(E') to which we add i if ε ∈ E'
- The transitions are (i,a,a) for a ∈ First(E') and (a,b,b) for (a,b) ∈ Next(E')
In our expression (x1 + x2) * (x3x4x5 + ε), we construct the initial state i with transition x1, x2, x3 to the respective states x1, x2, x3.
Then we use the following table to create the type links (a, b, b). For example if we take the first line we have the following arcs: (x1, x1, x1), (x1, x2, x2), (x1, x3, x3).
To finish we add the terminal states {x1, x2, x5}. This finally gives the following automaton:
