Learning Classifier System

The objective of the Learning Classifier System algorithm is to optimize payoff based on exposure to stimuli from a problem-specific environment. This is achieved by managing credit assignment for those rules that prove useful and searching for new rules and new variations on existing rules using an evolutionary process.

The actors of the system include detectors, messages, effectors, feedback, and classifiers. Detectors are used by the system to perceive the state of the environment. Messages are the discrete information packets passed from the detectors into the system. The system performs information processing on messages, and messages may directly result in actions in the environment. Effectors control the actions of the system on and within the environment. In addition to the system actively perceiving via its detections, it may also receive directed feedback from the environment (payoff). Classifiers are condition-action rules that provide a filter for messages. If a message satisfies the conditional part of the classifier, the action of the classifier triggers. Rules act as message processors. Message a fixed length bitstring. A classifier is defined as a ternary string with an alphabet in {1, 0, #}, where the # represents do not care (matching either 1 or 0).

The processing loop for the Learning Classifier system is as follows:

  1. Messages from the environment are placed on the message list.
  2. The conditions of each classifier are checked to see if they are satisfied by at least one message in the message list.
  3. All classifiers that are satisfied participate in a competition, those that win post their action to the message list.
  4. All messages directed to the effectors are executed (causing actions in the environment).
  5. All messages on the message list from the previous cycle are deleted (messages persist for a single cycle).

Learning Classifier Systems are suited for problems with the following characteristics: perpetually novel events with significant noise, continual real-time requirements for action, implicitly or inexactly defined goals, and sparse payoff or reinforcement obtainable only through long sequences of tasks.

The learning rate for a classifier’s expected payoff, error, and fitness are typically in the range [0.1; 0.2]. The frequency of running the genetic algorithm should be in the range [25; 50]. The discount factor used in multi-step programs are typically in the around 0.71. The minimum error whereby classifiers are considered to have equal accuracy is typically 10% of the maximum reward. The probability of crossover in the genetic algorithm is typically in the range [0.5; 1.0]. The probability of mutating a single position in a classifier in the genetic algorithm is typically in the range [0.01; 0.05].

The experience threshold during classifier deletion is typically about 20. The experience threshold for a classifier during subsumption is typically around 20. The initial values for a classifier’s expected payoff, error, and fitness are typically small and close to zero. The probability of selecting a random action for the purposes of exploration is typically close to 0.5. The minimum number of different actions that must be specified in a match set is usually the total number of possible actions in the environment for the input. Subsumption should be used on problem domains that are known contain well defined rules for mapping inputs to outputs.