Gloushkov algorithm - Complex systems and AI

Difficulty

Easy 25%

Contents

Gloushkov's algorithm

Glushkov's algorithm constructs a automaton non-deterministic accepting the set of words described by a regular expression. The number of automaton states is proportional to the size of the expression.

The main steps of Gloushkov's algorithm starting from a regular expression E are as follows:

Transform the expression E into a new expression E 'where each letter occurs at most once in E'.
Build an automaton A 'accepting the set of words described by E'
Transform A 'into an automaton A accepting the set of words described by E by replacing the unique symbols of E' by the original ones

Step 1: separate letters

The first step in Glushkov's algorithm is to make all the letters in the expression distinct. Each occurrence of the same letter is numbered. The expression E = (ab + b)^* becomes, for example, E '= (ab₀ + b₁)^*.

Let's take a longer example: E = (a + b) * (abb + ε). After renaming we have the following expression: (x₁ + x₂) * (x₃x₄x₅ + ε).

We must now define the First group containing the symbols that can start a word. Last the group containing the symbols that can end the word. And a following table defining the symbols that can be suffixed with another symbol.

In this example we have:

Prime = {x₁, x₂, x₃}
Last = {x₁, x₂, x₅}
Table of the following:

Step 2: building the automaton

Let E 'be a regular expression where all the letters are different. We construct an automaton A 'as follows.

The set of states is Q = {i} ∪ {a: a appears in E '}
The initial state is state i
The final states are Last (E ') to which we add i if ε ∈ E'
The transitions are (i, a, a) for a ∈ Prime (E ') and (a, b, b) for (a, b) ∈ Next (E')

In our expression (x₁ + x₂) * (x₃x₄x₅ + ε), we construct the initial state i with the transition x₁, x₂, x₃ to the respective states x₁, x₂, x₃.

Then we use the following table to create the type links (a, b, b). For example if we take the first line we have the following arcs: (x₁, x₁, x₁), (x₁, x₂, x₂), (x₁, x₃, x₃).

To finish we add the terminal states {x₁, x₂, x₅}. Which ultimately gives the following automaton: