ROULETTE WHEEL SELECTION METHODS
Many selection techniques employ a ³roulette wheel´ mechanism to probabilistically select individuals based on some measure of their performance. A real-valued interval, Sum, is determined as either the sum of the individuals¶ expected selection probabilities or the sum of the raw fitness values over all the individuals in the current population. Individuals are then mapped one-toone into contiguous intervals in the range [0, Sum]. The size of each individual interval corresponds to the fitness value of the associated individual. For example, in Fig. the circumference of the roulette wheel is the sum of all 5 individual¶s fitness values. Individual 3 is the most fit individual and occupies the largest interval, whereas individuals 2 and 4 are the least fit and have correspondingly smaller intervals within the roulette wheel. To select an individual, a random number is generated in the interval [0, Sum] and the individual whose segment spans the random number is selected. This process is repeated until the desired number of individuals have been selected. The basic roulette wheel selection method is stochastic sampling with replacement (SSR). Here, the segment size and selection probability remain the same throughout the selection phase and individuals are selected according to the procedure outlined above. SSR gives zero bias but a potentially unlimited spread. Any individual with a segment size > 0 could entirely fill the next population. Stochastic sampling with partial replacement (SSPR) extends upon SSR by resizing an individual¶s segment if it is selected. Each time an individual is selected, the size of its segment is reduced by 1.0. If the segment size becomes negative, then it is set to 0.0. This provides an upper bound on the spread of . However, the lower bound is zero and the bias is higher than that of SSR. Remainder sampling methods involve two distinct phases. In the integral phase, individuals are selected deterministically according to the integer part of their expected trials. The remaining individuals are then selected probabilistically from the fractional part of the individuals expected values. Remainder stochastic sampling with replacement (RSSR) uses roulette wheel selection to sample the individual not assigned deterministically. During the roulette wheel selection phase, individual¶s fractional parts remain unchanged and, thus, compete for selection between ³spins´. RSSR provides zero bias and the spread is lower bounded. The upper bound is limited only by the number of fractionally assigned samples and the size of the integral part of an individual. For example, any individual with a fractional part > 0 could win all the samples during the fractional phase. Remainder stochastic sampling without replacement (RSSWR) sets the fractional part of an individual¶s expected values to zero if it is sampled during the fractional phase. This gives RSSWR minimum spread, although this selection method is biased in favour of smaller fractions. GENETIC ALGORITHM
A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evoluti evo lution, on, such as inheritance, mutation, selection, and crossover.
Methodology
In a genetic algorithm, a population of strings (called chromosomes or the genotype of the genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem, evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached. Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics and ot her fields. A typical genetic algorithm requires: 1. a genetic representation of the solution domain, 2. a fitness function to evaluate the solution domain. A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in genetic programming and graph-form representations are explored in evolutionary programming. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem one wants to maximize the total value of objects that can be put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used. Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, then improve it through repetitive application of mutation, crossover, inversion and selection operators. Initialization
Initially many individual solutions are randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, covering the entire range of possible solutions (the search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.
Selection
During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming. Reproduction
The next step is to generate a second generation population of solutions from those selected through genetic operators: crossover (also called recombination), and/or mutation. For each new solution to be produced, a pair of "parent" solutions is selected for breeding from the pool selected previously. By producing a "child" solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents". New parents are selected for each new child, and the process continues until a new population of solutions of appropriate size is generated. Although reproduction methods that are based on the use of two parents are more "biology inspired", some research suggests more than two "parents" are better to be used to reproduce a good quality chromosome. These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Generally the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions, for reasons already mentioned above. Although Crossover and Mutation are known as the main genetic operators, it is possible to use other operators such as regrouping, colonization-extinction, o r migration in genetic algorithms. Termination
This generational process is repeated until a termination condition has been reached. Common terminating conditions are: y y y y
y y
A solution is found that satisfies minimum criteria Fixed number of generations reached Allocated budget (computation time/money) reached The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results Manual inspection Combinations of the above
Simple generational genetic algorithm procedure: 1. Choose the initial population of individuals 2. Evaluate the fitness of each individual in that population 3. Repeat on this generation until termination (time limit, sufficient fitness achieved, etc.): 1. Select the best-fit individuals for reproduction
2. Breed new individuals through crossover and mutation operations to give birth to offspring 3. Evaluate the individual fitness of new individuals 4. Replace least-fit population with new individuals Introduction Genetic Algorithm (GA) is an artificial intelligence procedure. It is based on t he theory of natural selection and evolution. This search algorithm balances the need for:
1. exploitation Selection and crossover tend to converge on a good but sub-optimal solution. 2. exploration Selection and mutation create a parallel, noise-tolerant, hill climbing algorithm, preventing a premature convergence. For details of Genetic Algorithm, please refer to my partner's first article in GA Project.
Applications Traditional methods of search and optimization are too slow in finding a solution in a very complex search space, even implemented in supercomputers. Genetic Algorithm is a robust search method requiring little information to search effectively in a large or poorly-understood search space. In particular a genetic search progress through a population of points in contrast to the single point of focus o f most search algorithms. Moreover, it is useful in the very tricky area of nonlinear problems. Its intrinsic parallelism (in evaluation functions, selections and so on) allows the uses of distributed processing machines.
Basically, Genetic Algorithm requires two elements for a given prob lem: y y
encoding of candidate structures (solutions) method of evaluating the relative performance of candidate structure, for identifying the better solutions
Genetic Algorithm codes parameters of the search space as binary strings of fixed length. It employs a population of strings initialized at random, which evolve to the next generation by genetic operators such as selection, crossover and mutation. The fitness function evaluates the quality of solutions coded by strings. Selection allows strings with higher fitness to appear with higher probability in the next generation. Crossover combines two parents by exchanging parts of their strings, starting from a randomly chosen crossover point. This leads to new solutions inheriting desirable qualities from both parents. Mutation flips single bits in a string, which prevents the GA from premature convergence, by exploiting new regions in the search space. GA tends to take advantage of the fittest solutions by giving them greater weight, and concentrating the search in the regions which lead to fitter structures, and hence better solutions of the problem. Finding good parameter settings that work for a particular problem is not a trivial task. The critical facto rs are to determine robust parameter settings for population size, encoding, select ion criteria, genetic operator probabilities and evaluation (fitness) normalization techniques.
If the population is too small, the genetic algorithm will converge too quickly to a local optimal point and may not find the best solution. On the other hand, too many members in a population result in a long waiting times for significant improvement. Coding the solutions is based on the principle of meaningful building blocks and the principle of minimal alphabets, by using the binary strings. The fitter member will have a great er chance of reproducing. The members with lower fitness are replaced by the offspring. Thus in successive generations, the members on average are fitter as solutions to the problem. Too high mutation introduces too much diversity and takes longer time to get the optimal solution. Too low mutation tends to miss some near-optimal points. Two point crossover is quicker to get the same results and retain the solutions much longer t han one point crossover. The fitness function links the Genetic Algorithm to the pro blem to be solved. The assigned fitness is used to calculate the selection probabilities for choosing parents, for determining which member will be replaced by which child. Computer-Aided Design
Genetic Algorithm uses the feedback from the evaluation process to select the fitter designs, generating new designs through the recombination of parts of the selected designs. Eventually, a population of high performance designs are resulted.