The self-organizing map is inspired by feature maps of neurons in the brain made up of feature-sensitive cells that provide ordered projections between neuronal layers, such as those that may exist in the retina and cochlea. For example, there are acoustic feature maps that respond to the sounds an animal is most often exposed to, and tonotopic maps that may be responsible for preserving the order of acoustic resonances.
The information processing objective of the self-organized map algorithm is to optimally place a topology (grid or lattice) of codes or prototypes of vectors in the domain of the observed input data samples. An initially random pool of vectors is prepared which is then exposed to training samples. A winner takes all strategy is used when the vector most similar to a given input pattern is selected, then the selected vector and the neighbors of the selected vector are updated to more closely resemble the input pattern.
Repeating this process results in the distribution of codebook vectors in the input space that approximate the underlying distribution of the samples in the test dataset. The result is the mapping of the topology of the codebook vectors to the underlying structure in the input samples which can be summarized or visualized to reveal the topologically preserved features of the input space in a low dimensional projection.
The self-organized map is made up of a collection of codebook vectors linked together in a topological arrangement, usually a one-dimensional line or a two-dimensional grid. The codebook vectors themselves represent prototypes (points) within the domain, while the topological structure imposes an order between the vectors during the learning process. The result is a low dimensional projection or approximation of the problem domain that can be visualized, or from which clusters can be extracted.
The following algorithm provides a pseudocode to prepare vector codebooks using the self-organizing map method. Codebook vectors are initialized to small floating point values or sampled from the domain. The best match unit (BMU) is the pool's codebooks vector that has the minimum distance to an input vector. A measure of distance between input patterns must be defined. For real-valued vectors, this is usually the Euclidean distance:
where n is the number of attributes, x is the input vector and c is a given codebook vector.
The neighbors of the BMU in the topological structure of the network are selected using a neighborhood size which is linearly decreased during network formation. The BMU and all selected neighbors are then fitted to the input vector using a learning rate which is also linearly decreased with the training cycles:
where c_i (t) is the i-th attribute of a codebooks vector at time t, learn_rate is the current learning rate, an x_i is the i-th attribute of an input vector.
The neighborhood is usually square (called a bubble) where all the neighborhood nodes are updated using the same learning rate for iteration, or Gaussian where the learning rate is proportional to the distance from the neighborhood using a Gaussian distribution (neighbors more distant from the BMU are less updated).
The Self-Organizing Map (SOM) was designed for unsupervised learning issues such as feature extraction, visualization, and grouping. Some extensions of the approach can label the prepared codebook vectors that can be used for classification. The SOM is not parametric, which means that it does not rely on assumptions about the structure of the function it reconciles. The actual values in the input vectors must be normalized such that x is in [0; 1].
Euclidean distance is commonly used to measure the distance between real-valued vectors, although other distance measures can be used (such as the dot product), and data-specific distance measures may be required for non-attributes. scalars.
There should be enough training iterations to expose all training data to the model multiple times. The more complex the class distribution, the more codebook vectors will be needed, some problems may require thousands. Multiple passes of the SOM training algorithm are suggested for more robust use, where the first pass has a high learning rate to prepare the codebook vectors and the second pass has a low learning rate and runs for a long time. long period (maybe 10 times more iterations).
The SOM can be visualized by calculating a Unified Distance Matrix (U-Matrix) that highlights the relationships between nodes in the chosen topology. Principal component analysis (PCA) or Sammon mapping can be used to visualize only nodes in the network without their interrelationships.
A rectangular 2D grid topology is typically used for a SOM, although toroidal and spherical topologies can be used. Hexagonal grids have given better results on some problems and larger sized grids have been investigated.
The positions of the neurons can be updated gradually or in a batch model (each epoch being exposed to all training samples). Batch training should generally result in a more stable network.
Learning rate and neighborhood parameters generally decrease linearly with learning iterations, although nonlinear functions can be used.