Contenus

Toggle## Gloushkov's algorithm

Glushkov's algorithm constructs a automaton non-deterministic accepting the set of words described by a regular expression. The number of automaton states is proportional to the size of the expression.

The main steps of Gloushkov's algorithm starting from a regular expression E are as follows:

- Transform the expression E into a new expression E 'where each letter occurs at most once in E'.
- Build an automaton A 'accepting the set of words described by E'
- Transform A 'into an automaton A accepting the set of words described by E by replacing the unique symbols of E' by the original ones

## Step 1: separate letters

The first step in Glushkov's algorithm is to make all the letters in the expression distinct. Each occurrence of the same letter is numbered. The expression E = (ab + b)^{*} becomes, for example, E '= (ab_{0} + b_{1})^{*}.

Let's take a longer example: E = (a + b) * (abb + ε). After renaming we have the following expression: (x_{1} + x_{2}) * (x_{3}x_{4}x_{5} + ε).

We must now define the First group containing the symbols that can start a word. Last the group containing the symbols that can end the word. And a following table defining the symbols that can be suffixed with another symbol.

In this example we have:

- Prime = {x
_{1}, x_{2}, x_{3}} - Last = {x
_{1}, x_{2}, x_{5}} - Table of the following:

## Step 2: building the automaton

Let E 'be a regular expression where all the letters are different. We construct an automaton A 'as follows.

- The set of states is Q = {i} ∪ {a: a appears in E '}
- The initial state is state i
- The final states are Last (E ') to which we add i if ε ∈ E'
- The transitions are (i, a, a) for a ∈ Prime (E ') and (a, b, b) for (a, b) ∈ Next (E')

In our expression (x_{1} + x_{2}) * (x_{3}x_{4}x_{5} + ε), we construct the initial state i with the transition x_{1}, x_{2}, x_{3} to the respective states x_{1}, x_{2}, x_{3}.

Then we use the following table to create the type links (a, b, b). For example if we take the first line we have the following arcs: (x_{1}, x_{1}, x_{1}), (x_{1}, x_{2}, x_{2}), (x_{1}, x_{3}, x_{3}).

To finish we add the terminal states {x_{1}, x_{2}, x_{5}}. Which ultimately gives the following automaton: