Multi-objective Optimization using Evolutionary Algorithms Progress report by
Peter Dueholm Justesen
Department of Computer Science University of Aarhus Denmark
January 13, 2009
Supervisors: Christian N. S. Pedersen and Rasmus K. Ursem
Abstract This is a progress report describing my research during the last one and a half year, performed during part A of my Ph.D. study. The research field is multi-objective optimization using evolutionary algorithms, and the reseach has taken place in a collaboration with Aarhus Univerity, Grundfos and the Alexandra Institute. My research so far has been focused on two main areas, i) multi-objective evolutionary algorithms (MOEAs) with different variation operators, and ii) decreasing the cardinality of the resulting population of MOEAs. The outcome of the former is a comparative analysis of MOEA versions using different variation operators on a suite of test problems. The latter area has given rise to both a new branch of multiobjective optimization (MOO) called MODCO (Multi-Objective Distinct Candidates Optimization) and a new MOEA which makes it possible for the user to directly set the cardinality of the resulting set of solutions. To motivate and cover my previous and future work, the progress report is divided into three main parts: 1. Introduction to the research area 2. The contributions made by this author 3. Future work
Peter Dueholm Justesen Aarhus University
i
Table of contents 1 Introduction 1.1 Main concepts of multi-objective optimization . . . . . . . . . . . 1.2 Multi-objective optimization using evolutionary algorithms . . . . 1.2.1 The history of multi-objective evolutionary algorithms . . 1.2.2 Goals of multi-objective evolutionary algorithms . . . . . . 1.2.3 Basic operators of multi-objective evolutionary algorithms 2 Contributions 2.1 Genetic algorithms versus Differential Evolution 2.1.1 Content and ideas . . . . . . . . . . . . . 2.1.2 Results . . . . . . . . . . . . . . . . . . . 2.1.3 Evaluation and discussion . . . . . . . . 2.1.4 Future work . . . . . . . . . . . . . . . . 2.2 Multiobjective Distinct Candidate Optimization 2.2.1 Introduction, arguments and goals . . . . 2.2.2 Evaluation and discussion . . . . . . . . 2.2.3 Future work . . . . . . . . . . . . . . . . 2.3 Cluster-Forming Differential Evolution . . . . . 2.3.1 Content and ideas . . . . . . . . . . . . . 2.3.2 Results . . . . . . . . . . . . . . . . . . . 2.3.3 Evaluation and discussion . . . . . . . .
. . . . .
. . . . .
. . . . .
1 1 3 4 5 6
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
8 8 8 12 13 14 14 14 17 17 17 18 22 25
3 Future work 3.1 Testing MODCO algorithms on Grundfos problems 3.1.1 Constrained problems . . . . . . . . . . . . 3.1.2 Many-objective optimization . . . . . . . . . 3.1.3 The effect of changing MODCO parameters 3.1.4 A more efficient search? . . . . . . . . . . . 3.2 Inventing performance indices . . . . . . . . . . . . 3.3 Inventing alternate MODCO algorithms . . . . . . 3.4 Relevant conferences and journals . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
26 26 27 27 28 28 29 30 31
. . . . . . . . . . . . .
List of figures
32
List of algorithms
32
References
32
ii
1
Introduction
This section will introduce the main concepts of multi-objective optimization, as well as motivate the use of evolutionary algorithms for this. Multi-objective problems are problems with two or more, usually conflicting, objectives. The main difference from single-objective optimization is that a multi-objective problem does not have one single optimal solution, but instead has a set of optimal solutions, where each represents a trade-off between objectives. At Grundfos, a danish company manufacturing pumps, an example of a multiobjective problem is that of designing centrifugal pumps. The designs of centrifugal pumps are complex, and each design lead to a different efficiency rate and production price. Grundfos is naturally interested in making their pumps have maximum efficiency, which may be measured as the throughput of water per second. However, Grundfos do not want the pumps to cost too much to manufacture and sell, so at the same time they want to minimize the production cost. Typically, increasing the efficiency of a pump will also increase the production cost. This way optimal pump designs will have trade-off between cost and efficiency, ranging from maximum efficiency at maximum cost to minimum efficiency at minimum cost. However, finding the optimal designs is not easy, and the task calls for optimization procedures. A way to avoid the complexities of multi-objective optimization is to convert the multi-objective problem into a single-objective problem by assigning weights to the different objectives, calculating a single fitness value. The major problem with this weighted sum approach, is that it is subjective, as it ultimately leaves it to a decision maker (DM) to assign weights according to the subjective importance of the corresponding objectives. The approach further assumes that the DM has a priori knowledge of the importance of the different objectives, which is often hard or impossible to come by. The objective approach uses Pareto compliant ranking of solutions, as explained in the following section. This approach favours solutions which are better in a true, multi-objective sense. Only the latter approach has been investigated in my work, as this makes no assumptions and does not rely on higher level knowledge.
1.1
Main concepts of multi-objective optimization
The primary concept of multi-objective optimization, is the multi-objective problem having several functions to be optimized (maximized or minimized) by the solution x, along with different constraints to satisfy, as seen in Equation 1. x is a vector of decision variables: x = (x1 , x2 , ..., xn )T , where each decision variable xi ∈ R is bounded by the lower bound xLi and the upper bound xUi . These bounds constitute the decision variable space or simply the decision space D, and the M objective functions fm (x) define a mapping from D to the objective space Z. The surjective mapping is between the n-dimensional solution vectors x ∈ D and the mdimensional objective vectors fm (x) ∈ Z, such that each x ∈ D corresponds to one point y ∈ Z, as illustrated in figure 1. 1
The problem may be constrained, so Equation 1 also shows J inequality and K equality constraints. Solutions satisfying these constraints are feasible, and belong to the feasible part of the decision space; SD ⊂ D which the constraint functions map to the feasible part of objective space SZ ⊂ Z. Minimize/Maximize fm (x), m = 1, 2, ..., M ; Subject to gj (x) ≥ 0, j = 1, 2, ..., J; hk (x) = 0, k = 1, 2, ..., K;
(1)
xLi ≤ xi ≤ xUi , i = 1, 2, ..., N ; The transition from single-objective to multi-objective optimization problems introduce a challenge in comparison of solutions, since performance is then a vector of objective values instead of a single scalar. The concept of Pareto dominance addresses this issue, enabling comparison of solutions. We say that a solution x dominates solution y, written x ≺ y, if and only if the following two conditions hold: 1) The solution x is no worse than solution y in all objectives. 2) The solution x is better than y on at least one objective. Formally, assuming minimization on all objectives: fm (x) ≤ fm (y)∀m ∧ ∃i : fi (x) < fi (y)
(2)
This definition entails that the dominating solution is, in a true multi-objective sense, the better choice. The dominance relation allows for Pareto based ranking, ranking solutions according to how dominating they are wrt. a domination based comparison to other solutions. Other performance indicators should assign the best value to the most dominating solutions; such indicators are called Pareto compliant, in contrast to the weighted sum approach described above. The binary dominance relation presented above is transitive, asymmetric and nonreflexive, in short meaning that if x ≺ y and y ≺ z, then x ≺ z, if x ≺ y then y ⊀ x, and that x ⊀ x. However, several relations between solutions exist. A list of the most commonly used relations between solutions, the corresponding notation and formal interpretation is presented in Table 1, listed according to strictness imposed. relation strictly dominates dominates
notation x ≺≺ y x≺y
weakly dominates incomparable indifferent
xy xky x∼y
interpretation fm (x) < fm (y)∀m fm (x) ≤ fm (y)∀m ∧∃i : fi (x) < fi (y) fm (x) ≤ fm (y)∀m ¬(x y) ∧ ¬(y x) fm (x) = fm (y) ∀m
Table 1: Solution relations - assuming minimization on all objectives 2
From the definition of dominance, several other important definitions can be derived. When optimizing, we are interested in locating the non-dominated set of solutions. Among a set of solutions P , the non-dominated set of solutions P 0 are those that are not dominated by any other member of the set P . Correspondingly, we define the globally Pareto-optimal set as the non-dominated set of the entire feasible search space S ∈ D. This is also referred to simply as the Pareto-optimal set, and our goal using multi-objective optimizers is to approximate this set. In objective space, the mapping from the Pareto-optimal set is denoted the true Pareto-optimal front or simply the true Pareto front. See figure 1 for illustration.
Figure 1: Mapping from decision space to objective space - assuming maximization
1.2
Multi-objective optimization using evolutionary algorithms
One way to perform multi-objective optimization is by using an evolutionary algorithm (EA). Evolutionary algorithms are optimizers inspired by Darwenian evolution, and with this the concept of survival of the fittest. In an EA, solutions to a given problem is considered individuals of a population, where the fitness of individuals are given by how good they solve the problem at hand. In the population individuals may mate to create offspring, which makes parents and offspring compete for inclusion in the next generation. As only the most fit will survive this fight, the full population is improving iteratively in each passing generation. More formally, the strength of EAs comes from their use of a set of solutions, not only improving on a single solution. This makes it possible to combine several (good) solutions, when creating a new one. An EA is actually a stochastic metaheuristic, i.e. a general optimization method, basing itself on probabilistic operators. Thus, contrary to deterministic algorithms, EAs may produce different results from different runs. 3
The greatest difference between single-objective EAs and multi-objective EAs (MOEAs), is that for single-objective optimization, it is simple to return the most optimal solution in a population, as scalar based evaluation automatically implies a total order on solutions. For MOEAs, the situation is very different. Due to the higher dimensionality of the objective space, all resulting individuals of the population may be incomparable to each other, each representing an optimal trade-off between objectives. That is, the result of running a MOEA is typically a set of non-dominated solutions. From this result set, it is up to the decision maker to find out which solution(s) to realise. The full process is illustrated in figure 2. Multi−objective optimization problem Minimize F1 Minimize F2 ... Minimize Fn Subject to constraints
IDEAL Multi−objective optimizer
Step 1: Optimization
Chosen trade−off solution
Multiple candidate solutions on true Pareto front
Higher−level information on all candidate solutions
Step 2: Decision making
Figure 2: Multi-objective optimization process
1.2.1
The history of multi-objective evolutionary algorithms
Here, we present a brief historic view of some of the algorithms that has been used in the work of this author. Where single-objective EAs has been extensively researched for many years now, the field of MOEAs is relatively new. Here, we consider only algorithms, which in some way incorporates the elitism concept, which ensures that the number of non-dominating individuals in the population can only increase. The first, very popular elitist genetic algorithm for multi-objective optimization was the Non-dominated Sorting Genetic Algorithm; NSGA-II, created by Deb et al and published in 2000 [5]. This was the first MOEA, which further incorporated a diversity preserving mechanism, to ensure population diversity. Another elitist genetic algorithm, also incorporating a density measure, was the Strength Pareto Evolutionary Algorithm; SPEA2, created by Zitzler et al and published in 2001 [7]. Both of these algorithms were shown to solve 2D problems very efficiently [5; 6; 7]. 4
Later, an old idea made its way to the field of MOEAs. Differential Evolution is an alternative way of varying EA individuals based on vector difference, which was published in 1997 by Price and Storn [8]. The approach was adopted in a number of MOEAs, where we will consider only the most relevant to this research. Recently, Robiˇc and Filipiˇc combined NSGA-II and SPEA-2 selection with the Differential Evolution (DE) scheme for solution reproduction to create DEMO (DEMO: Differential Evolution for Multi-objective Optimization). The DE-based MOEAs were named DEMON SII and DEMOSP 2 , and these have been shown to outperform both NSGA-II and SPEA2 [9; 10]. As MOEAs have become increasingly popular over the last decade, it has been noted that the cardinality of the resulting populations of MOEAs is often too high for decision makers to make their final choice of candidate solutions. One way of dealing with this problem is by applying clustering, such that solutions within the same area of the objective space are reported as a single solution. Very recently, Knowles and Handl have suggested clustering by k-means (MOCK) [16], where solutions are being assigned clusters after an optimization run. Alternatively, this author and fellow researcher have suggested an approach for optimizing distinct candidates (MODCO) [2] during the run. The main difference in these approaches besides when applied is that in MODCO, the user directly sets the number of returned solutions, whereas the number of clusters are automatically determined using MOCK. In general, decreasing the result cardinality introduce some interesting options wrt. decreasing the population size along with the number of generations performed. Furthermore, this drastically decreases the amount of post-processing of results needed, as will be discussed later in this report. 1.2.2
Goals of multi-objective evolutionary algorithms
As stated above, the goal of a multi-objective evolutionary algorithm, is to approximate the Pareto-optimal set of solutions. However, this goal is often subdivided into the following three goals for the resulting population of a MOEA: 1. Closeness towards the true Pareto-front 2. An even distribution among solutions 3. A high spread of solutions First, we want all of our solutions to be as optimal as possible by making them get as close to the true Pareto front as possible, where closeness is measured as Euclidian distance in objective space. As most problems solved by MOEAs are NPcomplete combinatorial problems, there is no way of “guessing” the decision vector mapping to some good point in objective space. MOEAs select the most dominating (or most non-dominated) individuals for survival to drive the population towards the 5
true Pareto front, since these must be closest to it. Optimally, all returned solutions from a MOEA lie on the true Pareto front. The second goal of MOEAs is to cover as much of the Pareto front as possible, and this goal is very specific to multi-objective optimization. Having an even distribution of solutions on the Pareto front ensures a diverse set of trade-offs between objectives. Having a set of solutions all equidistant to their nearest neighbour provides the DM with an overview of the Pareto front, while also making the final selection possible, based on the range of trade-offs between objectives represented by the population, as illustrated in figure 2. The third goal is very connected to the second. Having a high spread means to have a high distance between the extreme solutions in objective space, and as before, this is to ensure coverage of the Pareto front. Ensuring population diversity is most often done by applying a density or crowding measure which penalise individuals, that are close to each other in objective space. Such measure also ensures a high spread, since it will force the full population to spread as far as possible. From an application point of view, the first goal is by far the most important, since it directly determines how optimal the returned solutions are. The second goal is important, but not crucial, since most often only a few solutions from the final population will be chosen for further investigation. Finally, the third goal is practically irrelevant, since extreme solutions are rarely, if ever, implemented in reality. To pursue the three goals, traditional MOEAs employ two mechanisms directly meant to promote Pareto-front convergence and a good solution distribution. The first mechanism is elitism, which ensures that solutions closest to the true Paretofront will never be eliminated from the population under evolution, i.e. the number of non-dominated solutions in the population can only increase. The second is a measure of crowding or density among solutions, a secondary fitness which is often incorporated into the Pareto-rank to form a single, final fitness. 1.2.3
Basic operators of multi-objective evolutionary algorithms
Here, we briefly introduce the MOEA operators, such that we have an overall idea of their purpose before introducing more concrete mechanisms. The operators are applied iteratively until some pre-determined termination criterion is met, usually depending on the number of function evaluations performed, as this is where the main computational effort is spend. This is however depending on the population size N , the dimensionality of the problem M , and the number of generations performed; T . It is usual to create one new offspring per parent, such that M × N × T function evaluations are performed in T generations. The basic operators used by MOEAs are: • Evaluation • Selection • Variation 6
Evaluation is typically based on the dominance relation, as described above. It is intended to give an indication of the level of dominance for each individual by assigning a Pareto-rank, which bases itself on the number of other individuals in the population that are dominated by the individual in question. This ranking must be Pareto-compliant. Depending on which method used for Pareto ranking, it can be more or less graded. Traditional MOEAs further incorporate a second fitness criterion when assigning the final fitness, in order to induce a total order on the quality of individuals before selection. Evaluation is performed in objective space, and is depending on the objective functions, which again are problem dependent. Selection comes in two variants, and often base itself on the fitness assigned during evaluation. The first selection to be applied, is called mating selection or sexual selection, where it is decided which individuals of the current generation get to mate and spawn offspring. This selection is often random or may include all individuals, in order to give all current generation individuals an equal chance to mate. Environmental selection is applied after variation is performed, by applying the famous rule of survival of the fittest, such that only the best of the combined set of parents and offspring survive to the next generation. This typically truncates the expanded population to its original size. Both kinds of selection are performed in objective space. Variation is applied after mating selection to the individuals, who were selected for mating. These individuals then get to create offspring, which are a variation of its parents. Here, two variation operators are typically applied; mutation and recombination. Mutation is intended to cause only small changes in the offspring, whereas the recombination operator makes it possible to retain good parts of several parents in the offspring. In general, we say that mutation enhances exploitation, whereas recombination enhances exploration of the search space. Common for all variation operators, is that they are applied in decision space.
Mating Selection
Evaluation
Variation
Environmental Selection
Figure 3: Multi-Objective Evolutionary Algorithm main loop
7
2
Contributions
This section is devoted to the contributions made by this author and affiliates. To a large part, my research so far has been focused on two main areas: • Genetic algorithms versus Differential Evolution • Multi-objective Distinct Candidate Optimization (MODCO) We will go through these areas, first introducing the relevant articles submitted for conferences in the MOEA society. We then go through the contributions with the following taxonomy: 1. Content and ideas 2. Results 3. Evaluation and discussion 4. Future work The content and idea section is intended to present the content and general ideas expressed in the contribution, while the results section will present results, both in the form of data and in the form of implemented algorithms or operators. Evaluation and discussion is concerned with evaluating and discussing the contribution wrt. related research, and finally the section on future work intends to sketch future investigations.
2.1
Genetic algorithms versus Differential Evolution
The article “Introducing the Strength Pareto Differential Evolution 2 (SPDE2) Algorithm - A Novel DE Based Approach for Multiobjective optimization” [1] was submitted for the Parallel Problem Solving from Nature (PPSN) conference in may, 2008, but was rejected. The article was an investigation of the difference in performance between genetic algorithms and Differential Evolution, and introduces the idea of using SPEA2 ranking and truncation with Differential Evolution as the variation and selection operator in a MOEA. 2.1.1
Content and ideas
EAs using variation operators that most often causes only small changes from parents to offspring are called genetic algorithms, for historical reasons. Another branch of EAs are differential evolution algorithms. The difference in the two kinds of algorithms lies in their way of applying mating selection and variation, along with an enhanced elitism concept in DE. The main idea of the contribution was to compare two popular multiobjective genetic algorithms (MOGAs); NSGA-II and SPEA2 with their DEenhanced counterparts. Here, we go through the main algorithmic differences, before introducing the concrete algorithms implemented and tested; NSGA-II [5; 6], SPEA2 [7] and the DEMO versions [9; 10]. 8
Genetic algorithms Using genetic algorithms, parents are stochastically selected and copied to an offspring vector, and are then subject to recombination and mutation operators in order to form offspring. After this, parents and offspring are joined in one population, subject to truncation by environmental selection based on fitness. This is illustrated in figure 4, where it may be noted that the population size always varies from 2N to N during truncation. Evaluation
Variation
Mating selection
Environmental selection
Figure 4: Genetic algorithm
Differential Evolution Using Differential Evolution [8], we use vector differences to create offspring, using several individuals to create a new candidate offspring to compete against its parent. All individuals of the population are tried out as parents, generating one offspring each. The DE offspring creation algorithm is depicted in Algorithm 1. Algorithm 1 Differential Evolution Require: Parent Pi , crossover factor CF , scaling factor F . Ensure: Offspring C. 1: Randomly select three individuals Pi1 , Pi2 , Pi3 from population P , where i, i1 , i2 and i3 are pairwise different. 2: Calculate offspring C as C = Pi1 + F · (Pi2 − Pi3 ). 3: Modify offspring C by binary crossover with the parent Pi using crossover factor CF . Another mayor difference using Differential Evolution, is that DE enhances elitism by applying further rules after having created the candidate offspring. These rules makes the population size vary from N up to 2N before truncation, as can be seen in figure 5. 9
Given parent x spawning offspring y, the elitist rules are: • If y ≺ x, x is replaced with y. • If y k x, y is added to the population, as it is then possibly globally nondominated. This is revealed during the competition with the parent part during environmental selection. • If x ≺ y, y is discarded.
Evaluation
Differential evolution
Incomparable offspring
Replaced by dominant offspring
Environmental selection
Figure 5: Differential Evolution
NSGA-II The elitism mechanism in NSGA-II [5; 6] is based on non-dominated sorting of the current population Pt . For each individual ∈ Pt , non-dominated sorting assigns a non-dominated rank equal to the non-dominated front label, which is used to group the individuals into fronts. Here, the first front1 consists of the populations nondominated solutions, the next front consists of solutions dominated only by the first front and so on. Different ways of performing non-dominated sorting are described in [6]. The density measure in NSGA-II is called crowding-sort, and gives an indication of the degree of any solutions distance to its nearest neighbours wrt. the M different objectives. The crowding measure assigned to individual i is the average side length of the cuboid given by its nearest neighbours in M dimensions, normalized with respect to the maximal and minimal function value of the respective objectives. 1
Where all individuals are assigned rank 1.
10
A full NSGA-II run is to perform non-dominated sorting on the combined population and offspring population. From here, as many fronts as possible are accommodated in the next population. The last front to be inserted is sorted using the crowding-sort procedure, and only the best individuals wrt. crowding distance is chosen for inclusion. This truncation mechanism favors rank first, and good crowding distance next, just as the binary crowded tournament selection operator which NSGAII applies to the parent part of the population to fill the offspring vector. Finally all individuals in the offspring vector are subject to crossover and mutation operators. SPEA2 SPEA2 [7] differs from NSGA-II by maintaining an archive At of size N of the best solutions found so far, and from this archive, a population Pt of size M is continuously generated to compete against it in generation t. As in NSGA-II, SPEA2 uses the dominance concept to promote elitism, but here a Pareto-strength value is assigned to each individual i in both population and archive according to how many solutions in both archive and population i dominates: S(i) = |{j|j ∈ Pt ⊕ At ∧ i ≺ j}|,
(3)
where || denote cardinality of a set, ⊕ is multiset union and i ≺ j means that i dominates j. After all individuals i ∈ At have been assigned a Pareto-strength, a raw fitness is assigned to individual i ∈ At equal to the sum of the strength of its dominators: X R(i) = S(j). (4) j∈Pt ⊕At ∧j≺i
The density estimation in SPEA2 is based on the k-th nearest neighbour method, using Euclidean distance in objective space. In short, density for an individual i is calculated as an inverse to its distance to its k-th nearest neighbour σik , where k = 1 corresponds to the closest neighbour: D(i) =
σik
1 . +2
(5)
Finally, each individual ∈ Pt ⊕ At is assigned a fitness value equal to the sum of its raw fitness and its density value; F (i) = R(i) + D(i) to provide a single value to judge by. After fitness assignment, non-dominated solutions ∈ Pt ⊕ At with F (i) < 1 are first archived, i.e. placed in At+1 . In case the archive is then too small, N − |At+1 | dominated individuals ∈ Pt ⊕ At are archived, where individuals are chosen according to their assigned fitness value. Otherwise, if the archive is too large, it is truncated by recursively removing the individual with the worst density from archive, i.e. the individual who is closest to its k-th nearest (non-deleted) neighbour is repeatedly chosen for deletion until the archive size is reached. After this, mating selection is applied using binary tournament on the archive, in order to fill the offspring vector, which is then subject to recombination and crossover operators. 11
DEMO versions The Differential Evolution versions of the above genetic algorithms are called DEMON SII and DEMOSP 2 [9; 10]. In the article [1], DEMOSP 2 is denoted SPDE2 (Strength Pareto Differential Evolution 2), as we thought this was a novel approach. However, the alternate DEMO version had already been investigated under the name DEMOSP 2 , and so we will use this. The DE versions of NSGA-II and SPEA2 are using the corresponding Paretoranking, secondary fitness assignment and truncation mechanism as their origins. The difference lies in applying the elitist rules, and in using several individuals for offspring creation, as seen in Algorithm 1 and in figure 5. Using DE, the elitist rules ensure the spread of good decision variables, while using several individuals increase the chance of retaining good parts of the decision variables and decrease the risk of premature stagnation by introducing more diversity than genetic variation operators. 2.1.2
Results
Here, we summarize the results of our contribution. Overall, the results from this contribution have been: 1. Implementation of NSGA-II and SPEA2 with genetic operators. 2. Implementation of DEMON SII and DEMOSP 2 . 3. Implementation of 5 ZDT (Zitzler, Deb, Thiele) test problems. 4. Implementation of 5 different performance indices (PIs) 5. Data from runs using all algorithms on all test problems with all PIs. Implemented algorithms All algorithms described in the contents section have been implemented in a C++ framework supplied by Rasmus Ursem working at Grundfos R & T. The framework features classes for numerical individuals along with an abstract class for multi-objective EAs, enabling implementation of different concrete versions. All algorithms have been implemented enabling several options regarding variation operators, populations size, number of generations performed, etc. As a part of implementing the genetic MOEAs, several genetic variation operators were implemented. The simulated binary crossover (SBX) operator was implemented, along with arithmetic crossover operators, and a Gaussian annealing mutation operator. For the DEMO versions, DE was implemented as in Algorithm 1 with the enhanced elitism rules, along with a few other DE variants, which showed no improvement over the rand/1/bin scheme presented. Test problems To test the different algorithms, 5 2-dimensional ZDT problems were implemented. These problems have different true Pareto-front characteristica, which attempts to make MOEAs prematurely converge in various ways. These problems are described further in [6]. 12
Performance indices As the true Pareto-fronts of the ZDT problems are known, it was possible to implement performance indices directly minded on giving an indication of the MOEA performance wrt. the 3 goals of MOEAs described earlier. 2 PIs on closeness, 2 PIs on distribution and 1 on spread were implemented. Data To gather data, each algorithm was run 10 times on each test problem. From the resulting populations and the known true Pareto-fronts, the PIs were calculated, averaged, analyzed and discussed. It was clear, that the DEMO versions of NSGAII and SPEA2 performed better than their original versions on all indices. Further, DEMOSP 2 was observed to perform slightly better than DEMON SII , possibly due to a more fine-grained Pareto-ranking and a recursive density assignment procedure. I further made a few investigations into the differences in distribution achieved by the genetic and the DEMO algorithms, and found a discrepancy between DE and the crowding/density measure of NSGA-II and SPEA2, as DE itself does not promote a good distribution. Thus, DE - based algorithms seem to be more prone to generate offspring, which are removed due to crowding and thus does not contribute to the steady qualitative improvement in the population. 2.1.3
Evaluation and discussion
As mentioned above, the article was submitted to the PPSN conference in may 2008, but was rejected. Some of the points of criticism were: 1. Low news value 2. Test problems were outdated The article was developed on background on the first DEMO-article by Robiˇc and Filipiˇc [9], but as the PPSN committee pointed out, a newer article had been released meanwhile. In their second DEMO article [10], Robiˇc and Filipiˇc made an investigation very similar to the one performed by this author. Especially they also incorporated SPEA2 selection and truncation in DEMO. However, Robiˇc and Filipiˇc tested on higher - dimensional problems, and with more general performance indices, which made their article superior to my work and further decreased the news value of my article. Unfortunately, the article by Robiˇc and Filipiˇc was not available when I started working on my article, which is the reason of the overlap. The problem with my chosen test suite for the article, was that MOEAs need to be tested on problems with more than two objectives. It has been shown that MOEAs working well on two - dimensional problems do not necessarily work well on higher-dimensional problems. Especially there seems to be a blowup in complexity when moving from two dimensions to three. On the more positive side, my article and the article by Robiˇc and Filipiˇc [10] agreed on the main point, that DE based MOEAs seem to outperform genetic algorithms, especially with respect to closeness towards the true Pareto-front. Thus, even though a part of my research was not up to date, it makes an interesting, valid point. 13
2.1.4
Future work
No future work is intended to follow up on this contribution. The implemented algorithms all have been tested to work well on a range of problems, especially the DE based ones. However, there seems to be little to gain in making further research in this direction, as much is already thoroughly covered in [10]. Still, the algorithms have been a basis for experimenting with new algorithms, partly based on the techniques of NSGA-II, SPEA2 and the DEMO versions. Especially the pareto ranking methods and the elite preserving truncation mechanism have been used in newly developed algorithms, along with Differential Evolution.
2.2
Multiobjective Distinct Candidate Optimization
We decided early that we wanted to research in which extent it was possible to decrease the cardinality of the resulting population of traditional MOEAs. After some discussion, we named the research area Multiobjective Distinct Candidate Optimization (MODCO), and submitted two papers on the area to the Evolutionary Multiobjective Optimization (EMO) 2009 conference. The first paper “Multiobjective Candidates Optimization (MODCO) - A New Branch of Multiobjective Optimization Research” [2] was mainly developed by coresearcher Rasmus Kjær Ursem as an argumentation and quantification of the MODCO area, based on his experience in real world optimization at Grundfos. This paper was rejected, mostly due to its non-technical content. The second paper “Multiobjective Candidates Optimization (MODCO): A Cluster Forming Differential Evolution Algorithm” [3] was mainly developed by this author as a first attempt to create a MOEA which complied with the MODCO goals. This paper was accepted, mostly due to its novel approach to decrease result set cardinality. Here, we want to briefly present the main arguments and goals described in the MODCO article before evaluating and discussing, as these are the basis of the second article on Cluster-Forming Differential Evolution (CFDE). That is, the concrete algorithm suggested in the CFDE paper is evaluated wrt. the goals of this paper. The CFDE article is presented in Section 2.3. Note that the article on MODCO holds no algorithms or results, and thus does not fully comply with the taxonomy from the introduction to section 2. 2.2.1
Introduction, arguments and goals
The concept of MODCO is the optimization of a user-defined low number of distinct candidates, with a user-defined degree of distinctiveness. This is in contrast to the traditional MOEA goal of covering as much as possible of the true Pareto-front. However, having a full population of alternatives wrt. the trade-offs between objectives, just introduce more choices to be made by the DM. Basically, a standard EA population is simply of too high cardinality to be directly usable. 14
MODCO basics MODCO tries to circumvent this problem by incorporating general, higher level information to the search, replacing step 2 in figure 2. This info includes relevant practical information such as: 1. How much time and money can be spend on post-processing the solution set? 2. How many solutions is it feasible to inspect and compare? 3. How distinct must solutions be in order to be distinguishable in tests? This information is relevant, and these questions must be answered, for most reallife applications. The goal of MODCO is to use this information as a guide towards a small, final set of distinct candidates, replacing step 2 in figure 2, such that this final selection is not only the responsibility of the DM, but is incorporated in the optimization process. Arguments for the soundness of MODCO To argue for the soundness of MODCO, four main categories were identified and argued about in detail in [2]. Here, we briefly review these categories of arguments: 1. Post-processing of many Pareto-optimal solutions. 2. Physical realization of a solution. 3. Decision making among large sets of solutions. 4. Algorithmic and theoretical perspectives. Post-processing many Pareto-optimal solutions is essential in most real world applications, where it is necessary to further investigate the most promising solutions from an optimization process, in order to to figure out which one(s) to physically implement. This process is expensive, time consuming, and it may only address a small part of the full final design. Therefore it would be preferable to only have a small set of solutions to perform post-processing on. Post-processing is expensive, since this typically involves prototyping and testing, where both may be very costly. Especially prototyping can be expensive, due to materials. Post-processing also includes more detailed simulations, which is normally very time consuming. For example, a conducting a full computational fluid dynamics (CFD) calculation may take days, making it infeasible to simulate hundreds of solutions, which is not an unusual cardinality of MOEA results. Finally, when the optimization process is only concerned with a part of a design, it is not feasible to have hundreds of alternatives, since the impact on the full design is expensive to calculate. The physical realization of solution sets returned from a traditional MOEA also suffers some disadvantages due to the high cardinality. This is due to the problem of having a 100 % accurate simulation, which is often far too costly. Normally, a few 15
percent of tolerance is acceptable in simulation. However, if the simulator inaccuracy is greater than the difference between neighbor solutions from the MOEA resulting population, a lot of these will not be distinguishable from each other. Again, a smaller, more distinct set of candidate solutions would be preferable. Choosing among large sets of solutions implies problems for the DM, wrt. human factors. The DM may not have the technical background, or the knowledge on optimization to be able to select among 100-500 different Pareto-optimal solutions to a problem with high dimensionality. The amount of solutions makes it infeasible to inspect and evaluate the different trade-offs, and further it is often not possible to state explicit preference rules in order to guide this final selection. The only general rule is, that solutions in knee regions in the objective space are preferred. Knee regions are areas in the objective space, where a small improvement in one objective leads to a high deterioration in another, or put another way, a convex part of the Pareto front. Hence, a small set of solutions in knee regions would be preferable from a DM point of view. Algorithmic perspectives on MODCO reveal a few points where the search for only small set of solutions should have advantages over traditional MOEA result sets. First of all, allowing a decrease in the local population diversity may result in a more focused search, in some ways similar to local search, which results in a better convergence toward the true Pareto front. Secondly, the approach makes it possible to use a smaller population size, since we are now no longer interested in covering the full Pareto front, but only in locating a few distinct candidates. This naturally implies fewer computational steps, especially function evaluations, but without compromising the quality of the found distinct candidates. Finally, if we are able to locate knee regions, these are indeed the interesting areas, so no other solutions need be reported. The search for knees may further improve convergence towards the true Pareto front, as will be argued later. MODCO goals The goals of MODCO are related to, but different from the goals of traditional MOEAs, and are inspired by the desire of finding a small set of optimal, distinct candidates, preferably in knee sections. The features of the ideal MODCO algorithm are described in more detail in [2], but in general, the goals of a MODCO algorithm are: 1. Closeness towards the true Pareto front. Ideally, solutions are placed on the true Pareto front, but otherwise they should be as close as possible. 2. Global distinctiveness in the returned solutions, i.e. that solutions are distinguishable wrt. performance or design. Ideally, this feature is parametrized, in order for the user to set how distinct returned solutions should be. 3. Local multiobjective optimality in returned solutions, i.e. it is preferred to return solutions in knee regions. 16
The first goal corresponds to the first goal of traditional MOEAs, as we are still interested in optimizing our results. The two last goals are different, and based on the idea of a decreased cardinality in the result set. That is, if we want to decrease the result set cardinality, we need other ways of controlling the global diversity. Further, as argued earlier, results in knee regions are more interesting for a DM, and so these should be reported. 2.2.2
Evaluation and discussion
The paper on MODCO was rejected for the EMO 2009 conference, mostly due to its non-technical nature. Much of the argumentation is based on experience, and thus deals with many non-technical issues related to decision making and the quantification of DM preferences. The connection to the more concrete paper on the CFDE algorithm is strong and is pointed out several places, but this was not enough to be accepted. On afterthought, the MODCO article could have been compressed, and maybe inserted in the beginning of the CFDE paper, simply giving the goals and a short argumentation. This could have increased the correlation between the theory and the practical appliance of MODCO. However, we wanted to separate the theory from practical experiments in order to be able to argue for the soundness and use of both. 2.2.3
Future work
The future work for the MODCO article includes a rewrite, incorporating some more technical material from the technical report [4] also related to MODCO. This is due to the criticism of the articles non-technical nature, and in order to make correlations to CFDE more clear. As the MODCO paper gives much of the argumentation for why this class of algorithms should even be considered, it is further important for the CFDE article. Thus we need the article to be publicly available, and as such it must be well founded. Further, we are currently discussing the MODCO branch of MOO with a few well known writers within the MOEA community, in order to establish the usability and related issues of the MODCO branch of MOO. A few answers have so far been positive towards the idea and have suggested further reading.
2.3
Cluster-Forming Differential Evolution
The article “Multiobjective Candidates Optimization (MODCO): A Cluster Forming Differential Evolution Algorithm” [3] was mainly developed by this author as an empirical investigation on the first concrete algorithm which conforms to the MODCO goals. The article was accepted at the EMO (Evolutionary Multiobjective Optimization) 2009 conference. Due to this fact and the correlation to the MODCO paper [2], future work of this paper is presented in the section devoted to this; Section 3. 17
2.3.1
Content and ideas
The article [3] demonstrates the first instance of a MODCO algorithm; the ClusterForming Differential Evolution (CFDE) algorithm. It is intended as a sequel to the article introducing MODCO as described above, implementing and evaluating a concrete instance of the suggested algorithm class. The algorithm follows the goals of MODCO, resulting in the following features: 1. User defined result set cardinality, parametrized as KN C . 2. User defined performance distinctiveness, parametrized as KP D . 3. The ability of converging towards knee regions. These features corresponds to the MODCO goals, also used for evaluating the CFDE algorithm performance. For reference, the CFDE algorithm is depicted in Algorithm 2. As usual we have a population P of size N , but the primary data structure is now a vector of subpopulations Pi ∈ P of size N/KN C each, assuming WLOG that N mod KN C = 0, as seen in figure 6. Further, a vector holding subpopulation centroids, and a temporary offspring vector are used. minDist(Ci ) is the function returning the minimum distance from centroid Ci to the nearest other centroid, whereas the calculation of σ is depending on both KP D and the current problem. Algorithm 2 Cluster-Forming Differential Evolution Require: Population size N , KN C , KP D Ensure: KN C different non-dominated individuals. 1: Initialize KN C subpopulations with N/KN C random individuals in each 2: while Halting criterion has not been met do 3: Perform global DE-based mating - store incomparable offspring 4: Calculate subpopulation centroids Ci 5: Migrate incomparable offspring to nearest subpopulation wrt. centroid 6: for All Pi ∈ P do 7: if minDist(Ci ) < σ then 8: Assign nearest other centroid distance to each individual xi,j ∈ Pi 9: else 10: Assign knee utility function value to each individual xi,j ∈ Pi 11: end if 12: end for 13: Assign final fitness wrt. global pareto rank, then secondary fitness 14: Truncate subpopulations wrt. final fitness 15: end while 16: Return KN C solutions, by returning the non-dominated solution closest to the subpopulation centroid from each subpopulation.
18
K
N/ K
N/ K
NC
Parent part
NC
Offspring part
NC
Figure 6: Population after migration Main algorithm To initialize, the CFDE algorithm creates N random individuals, and insert these in KN C subpopulation with N/KN C individuals in each, which is depicted as the parent part in figure 6. For problems with two objectives, an initial sorting is performed before insertion into subpopulations, in order to enhance clustering. Then the CFDE algorithm proceeds by performing global mating with replacement as in usual DE, and as seen in Algorithm 1 and in figure 5. However, it stores the incomparable offspring in a temporary offspring vector, until it can be determined, which subpopulation they should belong to. From the parent part of the subpopulations, a centroid for each is then calculated. Following this, the incomparable offspring are migrated to the subpopulations with the nearest centroid. At this point, the CFDE algorithm determines which of the two secondary fitness measures to use for each subpopulation. The secondary fitness measure is each individual’s distance to the nearest other centroid, if the subpopulations centroid is too close to its nearest neighboring centroid2 . This makes subpopulations reject each other by favoring the individuals furthest away from other subpopulation centroids. Further, this most likely penalizes solutions created far from the subpopulation centroid, as these are most likely to be close to other subpopulation centroids, effectively enhancing clustering. In case the centroid of the subpopulation is sufficiently far away from its neighbours, the secondary fitness measure is Branke et al.’s utility function, favoring individuals in knee regions [12]. Thus, a subpopulation will search for knees, if not too close to another subpopulation centroid. 2
Wrt. σ which depends on KP D and the problem, see Section 2.3.2 for an example calculation.
19
Still, CFDE maintain focus on convergence towards the true pareto-front, so it then assigns to each individual a global pareto rank using the NSGA-II non-dominated sorting. This is used for assigning each individual a final fitness, such that the final fitness incorporates both rank and secondary fitness measure in a total order. Finally each subpopulation is truncated to the original size of N/KN C using the truncation mechanism of NSGA-II. Here, the main differences are, that truncation is done locally in subpopulations, and that the subpopulations are truncated using one of the two secondary fitness measures, which is incorporated in the final fitness. Hence, some subpopulations may be truncated using distance, and others using the knee utility function. This way subpopulations may be attracted to different knee regions, while forming clusters during the evolutionary process. As we return only one solution from each subpopulation, we get the wanted number of distinct solutions returned. Novel contributions In the article [3], a more thorough walktrough of all the mechanisms of the CFDE algorithm is found, but here we will only go through the novel contributions and their compliance to the MODCO goals. This is due to many mechanisms used in the CFDE algorithm having already been explained in this report. The novel contributions in the CFDE algorithm are: 1. Flexible subpopulation based Differential Evolution. 2. Centroid calculation for each subpopulation allowing for migration. 3. Subpopulation centroid distance based secondary fitness. 4. Alternating secondary fitness assignment to subpopulations. The idea of subpopulations is not new in itself, and has also been used in singleobjective optimization, e.g. in multi-modal optimization, where decision spaces contains several global optima to be discovered, not unlike multi-objective optimization. However, to this authors knowledge, this is the first algorithm with a parametrized amount of subpopulations subject to Differential Evolution. The subpopulation approach is naturally crucial for the full algorithm, in that it allows for reporting back the wanted number of distinct candidates. The centroid calculation for each subpopulation is simple and based on the placement of the parent part of each subpopulation in objective space. It gives the average placement of the elite (parent) part of each subpopulation, and is used for both migration and the centroid distance calculation used in the secondary fitness assignment. Thus, calculating centroids is essential for the clustering ability of CFDE. For each subpopulation Pi , we calculate the centroid Ci = [Ci,1 , Ci,2 ...Ci,M ] as the average point of the elite in objective space: PN/KN C Ci,m =
fm (xi,j ) , m = 1..M. N/KN C
j=1
20
(6)
The centroid distance calculation enhances clustering by penalizing the individuals of any subpopulation with the distance to the nearest other subpopulation centroid. As this is to be maximized, solutions created far from the subpopulation centroid are most prone for penalty, as they are most likely to be close to the centroid of another subpopulation. This further makes subpopulations reject each other, as solutions generated the furthest away from other centroids are then favored. Let dist(x, y) denote the distance in objective space between point x and point y, each of dimension M . Further, let min(S) denote the minimal element of the set S. For the subpopulation Pi , we then assign to each individual x ∈ Pi a secondary fitness SF as: SF (x) = min({dist(f (x), Cj ), j = 1..KN C , j 6= i})
(7)
The subpopulation approach of CFDE also makes it possible to assign alternating secondary fitnesses, according to case. To this authors knowledge, this is also a new idea, facilitating the use of an arbitrary amount of secondary fitness functions. However, this calls for a priority among secondary fitnesses. In the case of CFDE, we are only interested in distinct candidates, and thus the two secondary fitness measures conflicts only if knee regions are located too close wrt. σ, i.e. we may not report back solutions in knee regions located too close too each other, as they are not considered distinct. Otherwise, knee search will not deteriorate clustering, as knee regions are typically small, and likewise, the centroid distance assignment will not prevent knees from being found when distinctiveness is achieved for any subpopulation. The alternate secondary fitness measure, the utility function proposed in [12], is intended to discover knee regions by calculating an average fitness value for a large number of randomly sampled weight vectors. If this average fitness is good, the individual is more likely to reside in a knee region. Knee regions are characterized by the fact that a small improvement in one objective, will result in large deterioration in another objective. The utility function takes only one argument, precision, denoting the number of sample P weight vectors to apply. Let λ denote the weight vector of dimension M , with m λm = 1. Then we calculate the secondary fitness SF with precision precision of each individual x ∈ Pi as: Pprecision λp · f (x) p=1 SF (x) = (8) precision Lastly, let us see how CFDE comply with the goals of MODCO, and how this is supported by the novel contributions. Goal 1 of convergence is ensured by elitism within subpopulations, i.e. the number of globally non-dominated solutions within a subpopulation can only increase. Further this is enhanced by the elitism rules using Differential Evolution. Goal 2 of distinctiveness is achieved using the subpopulation approach, which together with the centroid calculation, migration and centroid distance penalty forms clusters from the initial random subpopulations, each reporting back a distinct solution. Further, the subpopulation approach is easily parametrized to enable user setting. Goal 3 of detecting knees is achieved using Brancke’s knee utility function when distinctiveness is reached. 21
2.3.2
Results
To provide results, the CFDE algorithm was demonstrated on 3 kinds of problems from MOEA literature; the 2D ZDT problems [6; 5], the knee problems of Branke et al. [12], and all non-constrained 3D DTLZ problems [11], all with problem settings as suggested in the respective papers. The experiments were performed with respect to the MODCO goals: • Convergence performance (MODCO goal 1) • Global distinctiveness (MODCO goal 2) • User-defined performance distinctiveness (MODCO goal 2) • Local multiobjective optimality (MODCO goal 3) First, we check CFDE convergence against the DEMO versions, which have demonstrated good performance on many problems [9; 10]. Next, we want to demonstrate convergence to KN C clusters, how we may change solution diversity by setting KP D , and finally that CFDE are able to locate knees. We demonstrate the use of KP D only on knee problems, as this is only relevant for such problems. For ZDT and DTLZ problems, we therefore always set σ = ∞, effectively disabling knee search. For the 3D DTLZ problems, we have used a higher KN C to ensure that we find both extreme and intermediate trade-offs. All other settings of the algorithms investigated can be found in [3], Section 3, Table 1 and 2. Convergence performance To deal with the different cardinality of more standard MOEAs and the CFDE algorithm, we use the universal notion of dominance. Here, we compare the CFDE algorithm against DEMON SII and DEMOSP 2 . One algorithmic argument for MODCO is that the low number of returned solutions allows for a more focused search because MODCO does not aim at an even distribution. Consequently, a MODCO algorithm should be able to return solutions closer to the true pareto front. So we wish to investigate in which extent the returned solutions from the CFDE algorithm dominate the most similar solutions from the returned population of the competing MOEAs, where similarity is measured as distance in objective space. This way we see if the CFDE approach is competitive to simply picking KN C solutions from the resulting populations of the DEMO versions. For all results presented in [3], these are generated using the NSGA-II version of global ranking and truncation in the CFDE algorithm. We have used 20 runs for both the DEMO versions and for CFDE on each problem. For each generated population of CFDE, we have compared each resulting individual to its most similar counterpart from each of the DEMO populations. This indicator gives a percentage of the amount of dominating, dominated and incomparable individuals CFDE was able to produce and is independent of KN C and KP D . 22
1
1 20 runs - 5 clusters
20 runs - 10 clusters 20 runs - 5 clusters
0.8
0.5 F2
F2
0.6 0
0.4 -0.5
0.2 0
-1 0
0.2
0.4
0.6
0.8
1
0
F1
0.2
0.4
0.6
0.8
1
F1
Figure 7: ZDT1 plot - 20 runs - 5 clusters
Figure 8: ZDT3 plot - 20 runs
The results on each of the test problems can be found in Table 3 and 4 in [3] along with a more thorough walkthrough, but here we will only note that the CFDE results are almost never dominated by results from the DEMO versions, in fact only on a single test problem with a high number of local Pareto fronts near the global one, the DEMO versions seems to outperform CFDE. On the rest of the 15 test problems, CFDE demonstrates equal or superior convergence, with up to 87 % dominating solutions produced compared to the most similar DEMO counterparts. Overall, the CFDE algorithm appears to be superior to the DEMO versions wrt. convergence. Global distinctiveness Global distinctiveness is achieved by the CFDE algorithm using the centroid distance to repel subpopulations. Figure 7 and 8 displays the returned results of 20 runs of the CFDE algorithm on ZDT1 and ZDT3, using KN C = 5 and KN C = 10. As mentioned above, we here set σ = ∞. Similar robust convergence is seen for the other test problems, i.e., CFDE found roughly the same set of distinct candidates in repeated runs. As can be seen in figure 7, all of the 20 runs of CFDE returned similar distinct solutions. In figure 8, we see that using KN C = 5 ensures a result returned from each of the 5 patches of the true pareto-front and again we see only a small variation. However, using KN C = 10 makes the returned results be much more spread in the 20 runs, since there are now more clusters to be formed than there are discontinuous patches. As can be observed from the density, solutions will here most often seek the most outer part of the patches making the returned solutions as distinct as possible. User-defined performance distinctiveness The MODCO parameter KP D ∈ [0, 1] allows the DM to set how distinct the returned solutions should be. A low value corresponds to a low distinctiveness and a high value to a high distinctiveness, but as mentioned, KP D is problem dependent. So to demonstrate the effects of changing KP D , we have chosen DEB3DK as an illustrative example, as it allows us to visualize the effect of altering the balance between knee search and subpopulation repelling. We first demonstrate the calculation of σ used in the CFDE algorithm. First, we will assume settings KP D = 1 and KN C = 5. For DEB3DK, we may use reference 23
points z ∗∗ = (0, 0, 0) and z I = (8, 8, 8), which span the interesting part of the objective space. Then we can calculate: σ = KP D /KN C · ||z ∗∗ − z I || = 1/5 ·
√
192 ≈ 2.77
(9)
In this setting, subpopulations will repel each other if they get within a distance of 2.77 of each others centroids. Setting KP D = 1 should ensure maximum global distinctiveness, such that we get clusters uniformly spread across the objective space spanned by the reference points. Contrary, setting KP D close to zero enables clusters to get closer to each other while searching for knees.
True front KPD=1.0 - 20 runs KPD=0.5 - 20 runs KPD=0.2 - 20 runs 8 7 6 5 4 F3 3 2 1 0 0
1
2
3 4 F2
5
6
7
8 87
6
10 2 3 5 4 F1
Figure 9: DEB3DK plot - investigating user defined performance distinctiveness Figure 9 illustrates the results of setting KP D to 0.2, 0.5 and 1.0. For KP D = 1, the 5 clusters are equidistant around the single knee, where one cluster is placed. The four clusters not in the knee are repelled from each other as they reach a distance of 2.77 between centroids, as was demonstrated in the example calculation above. The four clusters not in the knee are located in the partial knees on the lines forming a cross. For KP D = 0.5, we always hit the knee with one cluster. Further, it can be seen that some solutions has found other knee regions, crawling towards the one in the middle, but not being allowed to get too close. Setting KP D = 0.2 results in all clusters getting very close to the single knee region. Overall, it is clear that increasing KP D indeed makes clusters repel each other more. Local multiobjective optimality Figure 10 and 11 illustrates the knee problems DO2DK and DEB2DK, with the resulting CFDE individuals of 20 runs. In the DO2DK problem, we set K = 4 and s = 1.0, such that we have exactly the same 24
8
8 true front 20 runs - 5 clusters
true front 20 runs
7
6
6
5
5 F2
F2
7
4
4
3
3
2
2
1
1 0
0 0
1
2
3
4
0
5
F1
Figure 10: DO2DK plot - 20 runs - 5 clusters - 4 knees
1
2
3
4 F1
5
6
7
8
Figure 11: DEB2DK plot - 20 runs - 4 clusters - 4 knees
settings as has been used for creating the results illustrated in figure 4 in [12]. For the 20 runs depicted in figure 10, it may be noticed, that the density of solutions near knee regions is very high. When using KN C = 5, CFDE finds the 4 knee regions very precisely, while one cluster typically hits an outer solution, or is caught in-between knee regions. For DO2DK, we have used KP D = 0.75 corresponding to σ = 1.5. This way we keep clusters separated, while still allowing for knees to be found. For the DEB2DK problem, we have used K = 4 to replicate the results illustrated in figure 5 in [12]. In figure 11, we again see that for the 20 runs the density of solutions near knees are very high. Here, we have to set KN C to be equal to the number of knees, and it is clear that all knees are discovered in all runs. Here, we have used KP D = 0.2 corresponding to σ = 0.5. This is low, so the centroid distance assignment is rarely used. Hence, subpopulations converge to knees, and as long as σ > 0, the clusters formed will not overlap. From the figures 10 and 11, and further figure 9, it is clear, that CFDE is indeed able to locate knee regions. Further, it has been demonstrated how to balance the search using KP D resulting in different σ values. 2.3.3
Evaluation and discussion
The paper on the CFDE algorithm [3] was accepted at EMO 2009, and this author is to present it in Nantes, France, in April 2009. The referees were impressed with the novel approach, the presentation and the experiments, even though there was some complaints about submitting both an abstract MOEA class and a concrete implementation in two different papers. However, many usable comments were given in the reviews, especially considering that decreasing the result set cardinality is a quite novel approach in MOEA research, and some have given rise to future work. Given that we invented the binary performance indicators for MOEAs with different result set cardinality, the experiments were still said to be illustrative and usable. This, along with a somewhat simple and easily understandable parametrization are said to provide a potentially very useful functionality, very competitive to pruning the result sets of standard MOEAs. 25
3
Future work
This section is devoted to my future work, which mainly will be concerned with the appliance of MODCO algorithms in relation to Grundfos problems, with the CFDE algorithm being the first. The major challenge will be applying the newly developed algorithms to the real world problems supplied by Grundfos, which are typically much more complex than test problems, along with being constrained. Apart from working on Grundfos problems, new performance indices, possibly facilitating comparison between MOEAs of different result set cardinality are to be invented, along with new versions of MODCO algorithms. Lastly, we will investigate which conferences and journals could be relevant to submit to, and how this could coincide with my research. In general, the hope is a synergy between the new knowledge acquired from testing on real world problems, and the general appliance of MOEAs. Grundfos have both problems and decision makers, which are both required in order to investigate the full multiobjective optimization process, which is of general interest to computer scientists in the MOO area. On the other hand, the resulting designs of these investigations should be of interest to Grundfos, as they are presumably both distinct and optimal, wrt. user settings and the precision used in simulations.
3.1
Testing MODCO algorithms on Grundfos problems
Testing algorithms on real world problems are usually more interesting, in that the results have a concrete, physical interpretation. Grundfos has a general interest in MOO, since they have parametrized design spaces of their products, including simulators to give the performance of designs. That is, the objective functions are actually simulations of varying degree of precision. This is one reason for applying MODCO algorithms, as discussed in Section 2.2. Considering MOEAs, the decision space of test problems holds no meaning, and neither does the objectives. Test problems are fine for their purpose; revealing if and to which extent algorithms may solve them, in spite built-in traps and pitfalls - e.g. many local Pareto fronts in objective space. Test problems further facilitate performance comparison of algorithms, which is of general interest to the community. Real world problems are contrary prone to more advanced investigations, in that we may derive meaning and sense from e.g. the design space of centrifugal pumps. Likewise, the objectives are measurable in a physical sense, and are as such much more interesting. Working with specialists should enable new knowledge to be extracted from designs proposed by e.g. the CFDE algorithm, which are of natural interest to Grundfos. Likewise, we are more likely to uncover new knowledge on algorithmic behavior, when both design space and results are comprehendable. From testing on Grundfos problems, we hope to investigate several interesting facets related to MODCO research: 26
• An approach for constrained problems. • Whether MODCO is well suited for many-objective optimization. • How changing MODCO parameters will affect the result. • Whether the subpopulation approach could result in less computations. 3.1.1
Constrained problems
Most real world problems are constrained, i.e. solutions belonging to a part of the decision space are not feasible, since they violate constraints. The constraint space is defined by the constraint functions as seen in Equation 1, which maps the feasible part of decision space SD to the feasible part of objective space, SZ independently of the objective functions. This part is of varying shape and size, depending on the problem at hand. However, for real world problems, the constrained space is often a very large part of the objective space, and as such feasible solutions are much harder to generate than unfeasible ones. So the first challenge connected to the appliance of MODCO algorithms to Grundfos problems, is to find an approach for constrained problems. One idea so far is to use the approach of Generalized Differential Evolution (GDE3) [14], where additional rules are incorporated in the comparison of solutions. A feasible solution will always be preferred over an unfeasible one, but if two solutions are both infeasible, comparison then takes place in constraint space, such that the most dominant solution wrt. the constraint functions are favored. That is, solution x is considered dominant to solution y if x have the same or better constraint violation on all constraint functions, and has at least one constraint function with less constraint violation than y. This facilitates a search steadily progressing towards the feasible, unconstrained part of objective space. 3.1.2
Many-objective optimization
Many-objective optimization is when our problem has more than 3 objectives, which is also the case in many Grundfos related problems. The more precise we wish to make our model, the more objectives and constraint functions may be included, and a centrifugal pump design problem may easily contain 5 - 10 objectives. The challenge of many-objective optimization lies in the dimensionality of the objective space. The more dimensions, the more trade-offs between the objectives can be found. This may be illustrated by a simple counting argument; the more dimensions of the objective space, the more edges and corners of the induced hypercube will exist, representing possible non-dominated solutions. Thus, covering a high-dimensional Pareto front can be very hard, and will require a much larger population than for problems with 2 or 3 objectives. 27
MODCO algorithms, and with this the CFDE algorithm, are potentially well suited for many-objective problems with high-dimensional Pareto fronts, since the more focused search does not need to put any effort into covering the full pareto-front, which can be very difficult in high dimensional spaces, as argued above. As higher dimensional problems also calls for larger population sizes, the problem with choosing from the result set again becomes apparent for standard MOEAs, whereas this is circumvented using MODCO algorithms, e.g. CFDE. 3.1.3
The effect of changing MODCO parameters
Another hope from applying MODCO algorithms to Grundfos problems, is to make an investigation into different parameters settings of the MODCO algorithms. For test problems, the number of knees and the interesting part of the objective space is given, but this is not the case when handling real world problems. Here, any a priori knowledge of the problem and its objective space is hard to come by due to the complexity of the objective functions, and as discussed, should not be considered available in general. An investigation into the newly introduced MODCO parameters could reveal how these are used by a DM, and to which extent parameter changes affect the results. The experiments in [3] demonstrated a robust behavior of the CFDE algorithm, in that solutions converged to the same areas of the objective space under changing parameter settings. This behavior needs to be verified for real world problems, especially for constrained problems, and for problems with an unknown number of knee regions, arbitrarily placed in the objective space. This includes an investigation of the correlation between the parameters N , KN C and KP D to discover any synergies or discrepancies between these. Further, a study on the DE parameters F and CF could also be interesting, especially in relation to the subpopulation approach. DE is intended to cause smaller and smaller changes during a run, given individuals converge towards the same area of the decision space, e.g. mapping to a knee region or the global Pareto front. Here, we want to find out if the subpopulation approach enhances this behavior, which resembles local search. 3.1.4
A more efficient search?
Lastly, we intent to make some investigations as to find out if the subpopulation approach leads to a more efficient search. Overall, we intend to discover if all MODCO mechanisms are truly enhancing convergence, or if there are any discrepancies among them. This includes making experiments regarding the number of computations performed by MODCO algorithms vs. standard MOEAs. It is likely that we may decrease the population size of e.g. the CFDE algorithm without affecting the quality of the reported solutions, since KN C << N , and the KN C clusters each represent only a single solution. As the number of function evaluations 28
is depending on the population size, we may decrease the number of computations correspondingly. The question is, how much we can decrease population size without having the quality of results deteriorating. Further, the secondary fitness assignment of CFDE requires less computations than the density measures of NSGA-II or SPEA2, but we still need some research on the efficiency of the alternating measures, wrt. newly developed performance indices, as discussed below.
3.2
Inventing performance indices
One related challenge when designing new algorithms, is how to judge their performance. As mentioned, the MODCO goals differ from the ones of more standard MOEAs, and hence new performance indices must be invented to facilitate comparison - towards both standard MOEAs and other MODCO algorithms. Several performance indices are suggested in [4], but due to the different cardinality of MODCO results, these need to be adapted and possibly changed in order to apply. So due to time and space limitations, these were not used in [3], where a more general approach was used in tests, based on the universal notion of dominance and objective space distance. However, they are a starting point for the future development of PIs, which are to be incorporated in the new version of [2], as discussed in Section 2.2. The first goal of MODCO is closeness towards the true Pareto front, which is the same goal as of standard MOEAs. That is why we may adopt several metrics from MOO literature, but with respect to different cardinality of compared result sets. For example, it may be possible to use the hypervolumen indicator [13], which calculates the dominated space of solutions, given a reference point. This has the advantage of giving an indication of to which extent a solutions set is better than another, which is much more graded than the dominance relations used for statistics in [3]. However, the hypervolumen indicator will favor solutions sets which covers as much objective space as possible, and thus the lower cardinality of MODCO solutions set will be penalized. This may be circumvented by only comparing the most similar solutions in case of different cardinality of results, as we did in [3], but more investigation is needed in order to see if such a comparison is really fair. Same approach could be used for the epsilon indicator [13], which gives the distance a population must be moved towards an ideal point in order to dominate another population. Both of these indicators are Pareto compliant, but will be affected by the low number of results in MODCO solution sets. The second goal of MODCO is global distinctiveness, and we may measure this in decision space, objective space or by using category functions [2; 4]. Again we may be inspired by standard metrics such as the M∗2 indicator giving the average distance between solutions which are far enough away from each other in objective space, wrt. some niching radius corresponding to σ in CFDE, as noticed in [4]. A similar indicator M2 exist for design distinctiveness. Further, indicators for topology based distinctiveness are suggested in [4], such as the one developed by Rasmus Ursem, to reveal topological distinctions such as hills or valleys, based on testing a solution 29
position wrt. neighboring solutions. Using these different indicators, we may see to which extent the intended distinctiveness is achieved, based on statistical analysis. The third goal of MODCO is to discover knee regions, and so we need indicators to reveal this, as knee regions are not in general known a priori, as was the case with the knee problems used for illustration in [3]. One such indicator is suggested in [4], based on knowledge of an ideal corner of the hypercube spanned by the objective functions. However, the utility function of Brance used in the CFDE algorithm will give a similar indication of knee regions, and this is very simple to both understand, adjust and use.
3.3
Inventing alternate MODCO algorithms
In order to investigate the performance of MODCO algorithms, it would be nice to have some alternatives to compete against each other. So a part of my future research will be devoted to inventing alternatives to the CFDE algorithm, to facilitate comparison. The first step along this path is to create CFDESP 2 , the CFDE algorithm using SPEA2 ranking and truncation, as described in Section 2.1. The SPEA2 pareto strength ranking is more graded than the non-dominated sorting of NSGA-II when not all individuals of the population are non-dominated, which may be an advantage for complex, constrained problems. Thus we get both CFDEN SII and CFDESP 2 to compare against each other, just as the DEMO versions. These should incorporate the GDE3 approach described in Section 3.1.1, in order to apply to constrained problems. Second step will be incorporating other forms of distinctiveness into the CFDE algorithms, such that we may measure this in decision space instead of objective space, as well as using category functions, as mentioned in [2] and described further in [4]. In short, category functions map solutions in decision space to easily identified categories, e.g. motors of different standard sizes or electronics components of various standards. These a priori known categories are interesting distinctions wrt. solutions to the DM. However, note that the category mapping is not explicit preference functions. This way we investigate how the alternate definitions on distinctiveness will affect the search and the results, to which extent they make sense to the DM, and how the MODCO parameters are used in the different cases. Final step will be inventing MODCO algorithms not based on the CFDE algorithm, but with the same parameters to adjust. This should lead to other approaches than the one based on a fixed user defined number of subpopulations, which at that point should be thoroughly researched. One may imagine MODCO algorithms, where only upper and lower bounds on KN C are to be set, where after KN C and KP D will be adjusted dynamically during the run, while automatically optimizing distinctiveness and searching for knee regions. This could entail some mechanisms for dynamically creating subpopulations from one overall population, when knee regions or other attractive areas of the objective space are found, e.g. wrt. the optimization of global distinctiveness. 30
3.4
Relevant conferences and journals
Here, we will provide an overview of the conferences and journals, to which we may submit future publications. We will first list conferences and then journals, as well as give an early indication of what work may be done by these deadlines. The list further shows approximate dates for the conferences. • Evolutionary Multiobjective Optimization (EMO), April, 2009. • IEEE Congress on Evolutionary Computation (CEC), May 2010. • Genetic and Evolutionary Computations Conference (GECCO), July 2010. These conferences are all focused on evolutionary computation, and are all well suited for papers on new algorithms or new applications. EMO is more specialized, in that it is more focused on multiobjective optimization, and thus is very relevant to this author. So the acceptance of [3] for this conference is a good opportunity to discuss my work here. However, as CEC and GECCO are much further away in the future (2010), relevant submission will here include reports on testing new MODCO algorithms on Grundfos problems including new performance indices, as will be developed by the end of 2009, the approximate deadline for submissions to these conferences. Relevant journals include: • Applied Soft Computing - www.softcomputing.org/ • IEEE Transactions on Evolutionary Computations http://ieee-cis.org/pubs/tec/ Applied Soft Computing is a journal in connection to online conferences held by the world federation of soft computing (WFSC), and are focused on industrial applications of soft computing, such as evolutionary algorithms. The journal is connected to the major publisher Elsevier, which hosts many different forms of scientific papers and journals. The IEEE transactions on Evolutionary Computations is also interested in both methods and applications. For these journals, a thorough application of the initial CFDE algorithm versions (CFDEN S−II and CFDESP 2 ) on Grundfos problems could be very relevant, in that it is both a novel approach and further it is tested on an industrial engineering application, which is more interesting than test problems. This project would include a thorough rewrite of [2] and [4], to produce a single, introductory paper on the abstract class of MODCO algorithms and how to measure performance. This work is to be done during the spring and summer of this year, 2009.
31
List of Figures 1 2 3 4 5 6 7 8 9 10 11
Mapping from decision space to objective space - assuming maximization Multi-objective optimization process . . . . . . . . . . . . . . . . . . . Multi-Objective Evolutionary Algorithm main loop . . . . . . . . . . . Genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Population after migration . . . . . . . . . . . . . . . . . . . . . . . . . ZDT1 plot - 20 runs - 5 clusters . . . . . . . . . . . . . . . . . . . . . . ZDT3 plot - 20 runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . DEB3DK plot - investigating user defined performance distinctiveness . DO2DK plot - 20 runs - 5 clusters - 4 knees . . . . . . . . . . . . . . . DEB2DK plot - 20 runs - 4 clusters - 4 knees . . . . . . . . . . . . . . .
3 4 7 9 10 19 23 23 24 25 25
List of Algorithms 1 2
Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Cluster-Forming Differential Evolution . . . . . . . . . . . . . . . . . . 18
References [1] Justesen, P.D. and Ursem, R.K.: Multiobjective Distinct Candidates Optimization (MODCO): A Cluster-Forming Differential Evolution Algorithm. In: Accepted at the Evolutionary Multiobjective Optimization (EMO 2009) conference. Preliminary download: www.daimi.au.dk/∼ursem/publications/Justesen EMO2009 CFDE.pdf [2] Ursem, R.K., Justesen, P.D.: Multiobjective Distinct Candidates Optimization (MODCO) - A new Branch of Multiobjective Optimization Research. In: Rejected at the Evolutionary Multiobjective Optimization (EMO 2009) conference. Download: www.daimi.au.dk/∼ursem/publications/Ursem EMO2009 MODCO.pdf [3] Justesen, P.D. and Ursem, R.K.: Introducing SPDE2: A Novel DE Based Approach for Multiobjective Optimization. Rejected at the Parallel Problem Solving from Nature X (PPSN 2008) conference. Download: www.daimi.au.dk/∼juste/MOEA/article.pdf [4] R. K. Ursem and P. D. Justesen: Performance Metrics for Multiobjective Distinct Candidates Optimization (MODCO) Algorithms. In: Technical Report no 200801, Dept. of Computer Science, University of Aarhus, 2008. Preliminary download: www.daimi.au.dk/∼ursem/publications/Ursem TR 2008-01.pdf 32
[5] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. In: IEEE transactions on Evolutionary Computation 6, 2002, pp. 182 - 197. [6] Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. Wiley, 2002. [7] Zitzler, E., Laumanns, M. Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm. In: Technical Report 103, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland (2001). [8] Price, K. V., Storn, R.: Differential Evolution - a simple evolution strategy for fast optimization. In: Dr. Dobb’s journal 22, 1997, pp. 18 - 24. [9] Robiˇc, T., Filipiˇc, B.: DEMO: Differential Evolution for Multiobjective Optimization. In: LNCS 3410/2005, pp. 520 - 533. [10] Robiˇc, T., Filipiˇc, B.: Differential Evolution Versus Genetic Algorithms in Multiobjective Optimization. In: LNCS volume 4403/2007, pp. 257 - 271. [11] Deb, K., Thiele, L., Laumanns M., Zitzler, E.: Scalable Test Problems for Evolutionary Multi-Objective Optimization. In: Proceedings of the 2002 Congress on Evolutionary Computation, CEC ’02, pp. 825 - 830. [12] Branke, J., Deb, K., Dierolf H, Osswald, M.: Finding Knees in Multi-objective Optimization. In: LNCS volume 3242/2004, pp. 722-731. [13] J. Knowles, L. Thiele, and E. Zitzler. A Tutorial on the Performance Assessment of Stochastic Multiobjective Optimizers. In: TIK Report 214, Computer Engineering and Networks Laboratory (TIK), ETH Zurich, February 2006. Download: www.tik.ee.ethz.ch/sop/publications/. [14] Kukkonen S., Lampinen, J.: GDE3: The third Evolution Step of Generalized differential evolution. In: Proceedings of the 2005 Congress on Evolutionary Computation, CEC ’05, pp. 443 - 450. [15] Karthis, S., Deb, K., Miettinen, A.: A Local Search Based Evolutionary Multiobjective Optimization for Fast and Accurate Convergence. In: LNCS volume 5199/2008, pp. 815 - 824. [16] Handl, J., Knowles, J.: An Evolutionary Approach to Multiobjective Clustering. In: IEEE Transactions on Evolutionary Computation, volume 11, issue 1, Feb. 2007, pp. 56-76.
33