Contents
ToggleGloushkov's algorithm
Glushkov's algorithm constructs a automaton non-deterministic accepting the set of words described by a regular expression. The number of automaton states is proportional to the size of the expression.
The main steps of Gloushkov's algorithm starting from a regular expression E are as follows:
- Transform the expression E into a new expression E 'where each letter occurs at most once in E'.
- Build an automaton A 'accepting the set of words described by E'
- Transform A 'into an automaton A accepting the set of words described by E by replacing the unique symbols of E' by the original ones
Step 1: separate letters
The first step in Glushkov's algorithm is to make all the letters in the expression distinct. Each occurrence of the same letter is numbered. The expression E = (ab + b)* becomes, for example, E '= (ab0 + b1)*.
Let's take a longer example: E = (a + b) * (abb + ε). After renaming we have the following expression: (x1 + x2) * (x3x4x5 + ε).
We must now define the First group containing the symbols that can start a word. Last the group containing the symbols that can end the word. And a following table defining the symbols that can be suffixed with another symbol.
In this example we have:
- Prime = {x1, x2, x3}
- Last = {x1, x2, x5}
- Table of the following:
Step 2: building the automaton
Let E 'be a regular expression where all the letters are different. We construct an automaton A 'as follows.
- The set of states is Q = {i} ∪ {a: a appears in E '}
- The initial state is state i
- The final states are Last (E ') to which we add i if ε ∈ E'
- The transitions are (i, a, a) for a ∈ Prime (E ') and (a, b, b) for (a, b) ∈ Next (E')
In our expression (x1 + x2) * (x3x4x5 + ε), we construct the initial state i with the transition x1, x2, x3 to the respective states x1, x2, x3.
Then we use the following table to create the type links (a, b, b). For example if we take the first line we have the following arcs: (x1, x1, x1), (x1, x2, x2), (x1, x3, x3).
To finish we add the terminal states {x1, x2, x5}. Which ultimately gives the following automaton: