720 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 Standard Steady State Genetic Algorithms Can Hillclimb Faster Than Mutation-Only Evolutionary Algorithms Dogan Corus, Member, IEEE, and Pietro S. Oliveto , Senior Member, IEEE Abstract—Explaining to what extent the real power of genetic algorithms (GAs) lies in the ability of crossover to recombine indi- viduals into higher quality solutions is an important problem in evolutionary computation. In this paper we show how the inter- play between mutation and crossover can make GAs hillclimb faster than their mutation-only counterparts. We devise a Markov chain framework that allows to rigorously prove an upper bound on the runtime of standard steady state GAs to hillclimb the ONEMAX function. The bound establishes that the steady-state GAs are 25% faster than all standard bit mutation-only evolu- tionary algorithms with static mutation rate up to lower order terms for moderate population sizes. The analysis also suggests that larger populations may be faster than populations of size 2. We present a lower bound for a greedy (2 + 1) GA that matches the upper bound for populations larger than 2, rigorously proving that two individuals cannot outperform larger population sizes under greedy selection and greedy crossover up to lower order terms. In complementary experiments the best population size is greater than 2 and the greedy GAs are faster than standard ones, further suggesting that the derived lower bound also holds for the standard steady state (2 + 1) GA. Index Terms—Algorithms design and analysis, genetic algo- rithms (GAs), Markov processes, stochastic processes. I. INTRODUCTION GENETIC algorithms (GAs) rely on a population of indi-viduals that simultaneously explore the search space. The main distinguishing features of GAs from other random- ized search heuristics is their use of a population and crossover to generate new solutions. Rather than slightly modifying the current best solution as in more traditional heuristics, the idea behind GAs is that new solutions are generated by recom- bining individuals of the current population (i.e., crossover). Such individuals are selected to reproduce probabilistically according to their fitness (i.e., reproduction). Occasionally, Manuscript received November 15, 2016; revised April 13, 2017, July 7, 2017, and August 3, 2017; accepted August 5, 2017. Date of publi- cation September 26, 2017; date of current version September 28, 2018. This work was supported by EPSRC under Grant EP/M004252/1. (Corresponding author: Pietro S. Oliveto.) The authors are with the Rigorous Research Team, Algorithms Group, Department of Computer Science, University of Sheffield, Sheffield S1 4DP, U.K. (e-mail:
[email protected];
[email protected]). This paper has supplementary downloadable multimedia material available at http://ieeexplore.ieee.org provided by the authors. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TEVC.2017.2745715 random mutations may slightly modify the offspring produced by crossover. The original motivation behind these mutations is to avoid that some genetic material may be lost forever, thus allowing to avoid premature convergence [16], [18]. For these reasons the GA community traditionally regards crossover as the main search operator while mutation is considered a “back- ground operator” [1], [16], [19] or a “secondary mechanism of genetic adaptation” [18]. Explaining when and why GAs are effective has proved to be a nontrivial task. Schema theory and its resulting building block hypothesis [18] were devised to explain such working principles. However, these theories did not allow to rigor- ously characterize the behavior and performance of GAs. The hypothesis was disputed when a class of functions (i.e., Royal Road), thought to be ideal for GAs, was designed and experiments revealed that the simple (1+1) EA was more efficient [20], [27]. Runtime analysis approaches have provided rigorous proofs that crossover may indeed speed up the evolutionary process of GAs in ideal conditions (i.e., if sufficient diversity is avail- able in the population). The JUMP function was introduced by Jansen and Wegener [22] as a first example, where crossover considerably improves the expected runtime compared to mutation-only evolutionary algorithms (EAs). The proof required an unrealistically small crossover probability to allow mutation alone to create the necessary population diversity for the crossover operator to then escape the local optimum. Dang et al. [6] recently showed that the sufficient diversity, and even faster upper bounds on the runtime for not too large jump gaps, can be achieved also for realistic crossover proba- bilities by using diversity mechanisms. Further examples that show the effectiveness of crossover have been given for both artificially constructed functions and standard combinatorial optimization problems (see the next section for an overview). Excellent hillclimbing performance of crossover-based GAs has been also proved. Doerr et al. [8], [9] proposed a (1+(λ, λ)) GA which optimizes the ONEMAX function in (n √ log log log(n)/ log log(n)) fitness evaluations (i.e., run- time). Since the unbiased unary black box complexity of ONEMAX is (n log n) [24], the algorithm is asymptotically faster than any unbiased mutation-only EA. Furthermore, the algorithm runs in linear time when the population size is self- adapted throughout the run [7]. Through this paper, though, it is hard to derive conclusions on the working principles of standard GAs because these are very different compared to This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 721 the (1 + (λ, λ)) GA in several aspects. In particular, the (1 + (λ, λ)) GA was especially designed to use crossover as a repair mechanism that follows the creation of new solutions via high mutation rates. This makes the algorithm work in a considerably different way compared to traditional GAs. More traditional GAs have been analyzed by Sudholt [39]. Concerning ONEMAX, he shows how (μ+λ) GAs are twice as fast as their standard bit mutation-only counterparts. As a consequence, he showed an upper bound of (e/2)n log n(1 + o(1)) function evaluations for a (2 + 1) GA versus the en log n(1−o(1)) function evaluations required by any standard bit mutation-only EA [38], [41]. This bound further reduces to 1.19n ln n ± O(n log log n) if the optimal mutation rate is used [i.e., (1+√5)/2 ·1/n ≈ 1.618/n]. However, the analysis requires that diversity is artificially enforced in the population by breaking ties always preferring genotypically different indi- viduals. This mechanism ensures that once diversity is created on a given fitness level, it will never be lost unless a better fit- ness level is reached, giving ample opportunities for crossover to exploit this diversity. Recently, it has been shown that it is not necessary to enforce diversity for standard steady state GAs to outper- form standard bit mutation-only EAs [5]. In particular, the JUMP function was used as an example to show how the interplay between crossover and mutation may be sufficient for the emergence of the necessary diversity to escape from local optima more quickly. Essentially, a runtime of O(nk−1) may be achieved for any sublinear jump length k > 2 versus the (nk) required by standard bit mutation-only EAs. In this paper, we show that this interplay between mutation and crossover may also speed-up the hillclimbing capabil- ities of steady state GAs without the need of enforcing diversity artificially. In particular, we consider a standard steady state (μ + 1)GA [16], [34], [35] and prove an upper bound on the runtime to hillclimb the ONEMAX function of (3/4)en log n+O(n) for any μ ≥ 3 and μ = o(log n/ log log n) when the standard 1/n mutation rate is used. Apart from show- ing that standard (μ + 1) GAs are faster than their standard bit mutation-only counterparts up to population sizes μ = o(log n/ log log n), our Markov chain (MC) framework analy- sis provides two other interesting insights. First, it delivers bet- ter runtime bounds for mutation rates that are higher than the standard 1/n rate. The best upper bound of 0.72en log n+O(n) is achieved for c/n with c = (1/2)(√13 − 1) ≈ 1.3. Second, the framework provides a larger upper bound, up to lower order terms, for the (2 + 1) GA compared to that of any μ ≥ 3 and μ = o(log n/ log log n). The reason for the larger constant in the leading term of the runtime is that, for populations of size 2, there is always a constant probability that any selected individual takes over the population in the next generation. This is not the case for population sizes larger than 2. To shed light on the exact runtime for population size μ = 2 we present a lower bound analysis for a greedy GA, which we call (2+1)S GA, that always selects individuals of highest fit- ness for crossover and always successfully recombines them if their Hamming distance is greater than 2. This algorithm is similar to the one analyzed by Sudholt [39] to allow the derivation of a lower bound, with the exception that we do not enforce any diversity artificially and that our crossover operator is slightly less greedy (i.e., in [39] crossover always recombines correctly individuals also when the Hamming dis- tance is exactly 2). Our analysis delivers a matching lower bound for all mutation rates c/n, where c is a constant, for the greedy (2 + 1)S GA [thus also (3/4)en log n + O(n) and 0.72en log n + O(n), respectively, for mutation rates 1/n and 1.3/n]. This result rigorously proves that, under greedy selec- tion and semi-greedy crossover, the (2 + 1) GA cannot outper- form any (μ + 1)GA with μ ≥ 3 and μ = o(log n/ log log n). We present some experimental investigations to shed light on the questions that emerge from the theoretical work. In the experiments we consider the commonly used parent selec- tion that chooses uniformly at random from the population with replacement (i.e., our theoretical upper bounds hold for a larger variety of parent selection operators). We first com- pare the performance of the standard steady state GAs against the fastest standard bit mutation-only EA with fixed mutation rate (i.e., the (1+1) EA [38], [41]) and the GAs that have been proved to outperform it. The experiments show that the speedups over the (1+1) EA occur already for small problem sizes n and that population sizes larger than 2 are faster than the standard (2 + 1) GA. Furthermore, the greedy (2+1)S GA indeed appears to be faster than the standard (2+1) GA,1 fur- ther suggesting that the theoretical lower bound also holds for the latter algorithm. Finally, experiments confirm that larger mutation rates than 1/n are more efficient. In particular, better runtimes are achieved for mutation rates that are even larger than the ones that minimize our theoretical upper bound (i.e., c/n with 1.5 ≤ c ≤ 1.6 versus the c =1.3 we have derived mathematically; interestingly this experimental rate is similar to the optimal mutation rate for OneMax of the algorithm ana- lyzed in [39]). These theoretical and experimental results seem to be in line with those recently presented for the same steady state GAs for the JUMP function [5], [6]: higher mutation rates than 1/n are also more effective on JUMP. The rest of this paper is structured as follows. In the next section we briefly review previous related works that consider algorithms using crossover operators. In Section III we give precise definitions of the steady state (μ + 1)GA and of the ONEMAX function. In Section IV we present the MC frame- work that we will use for the analysis of steady state elitist GAs. In Section V we apply the framework to analyse the (μ + 1)GA and present the upper bound on the runtime for any 3 ≤ μ = o(log n/ log log n) and mutation rate c/n for any constant c. In Section VI we present the matching lower bound on the runtime of the greedy (2 + 1)S GA. In Section VII we present our experimental findings. In the conclusion we present a discussion and open questions for future work. Separate supplementary files contain an Appendix of omitted proofs due to space constraints and a complete version of this paper including all the proofs. II. RELATED WORK The first rigorous groundbreaking proof that crossover can considerably improve the performance of EAs was given by 1We thank an anonymous reviewer for pointing out that this is not obvious. 722 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 Jansen and Wegener [22] for the (μ+ 1)GA with an unrealis- tically low crossover probability. A series of following works on the analysis of the JUMP function have made the algorithm characteristics increasingly realistic [6], [23]. Today it has been rigorously proved that the standard steady state (μ+1)GA with realistic parameter settings does not require artificial diver- sity enforcement to outperform its standard bit mutation-only counterpart to escape the plateau of local optima of the JUMP function [5]. Proofs that crossover may make a difference between poly- nomial and exponential time for escaping local optima have also been available for some time [20], [36]. The authors devised example functions, where, if sufficient diversity was enforced by some mechanism, then crossover could effi- ciently combine different individuals into an optimal solution. Mutation, on the other hand required a long time because of the great Hamming distance between the local and global optima. The authors chose to call the artificially designed functions Real Royal Road functions because the Royal Road functions devised to support the building block hypothesis had failed to do so [28]. The Real Royal Road functions, though, had no resemblance with the schemata structures required by the building block hypothesis. The utility of crossover has also been proved for less artifi- cial problems, such as coloring problems inspired by the Ising model from physics [37], computing input–output sequences in finite state machines [25], shortest path problems [13], ver- tex cover [29], and multiobjective optimization problems [32]. The above works show that crossover allows to escape from local optima that have large basins of attraction for the muta- tion operator. Hence, they establish the usefulness of crossover as an operator to enchance the exploration capabilities of the algorithm. The interplay between crossover and mutation may produce a speed-up also in the exploitation phase, for instance when the algorithm is hillclimbing. Research in this direction has recently appeared. The design of the (1 + (λ, λ)) GA was theoretically driven to beat the (n ln n) lower bound of all unary unbiased black box algorithms. Since the dynamics of the algorithm differ considerably from those of standard GAs, it is difficult to achieve more general conclusions about the performance of GAs from the analysis of the (1+ (λ, λ)) GA. From this point of view the work of Sudholt [39] is more revealing when he shows that any standard (μ + λ) GA outperforms its standard bit mutation-only counterpart for hill- climbing the ONEMAX function. The only caveat is that the selection stage enforces diversity artificially, similarly to how Jansen and Wegener had enforced diversity for the Real Royal Road function analysis. In this paper, we rigorously prove that it is not necessary to enforce diversity artificially for standard-steady state GAs to outperform their standard bit mutation-only counterpart. III. PRELIMINARIES We will analyse the runtime (i.e., the expected number of fitness function evaluations before an optimal search point is found) of a steady state GA with population size μ and Algorithm 1: (μ+1) GA [5], [16], [34], [35] 1 P ← μ individuals, uniformly at random from {0, 1}n; 2 repeat 3 Select x, y ∈ P with replacement using an operator abiding (1); 4 z ← Uniform crossover with probability 1/2 (x, y); 5 Flip each bit in z with probability c/n; 6 P ← P ∪ {z}; 7 Choose one element from P with lowest fitness and remove it from P, breaking ties at random; 8 until termination condition satisfied; offspring size 1 (Algorithm 1). In steady state GAs the entire population is not changed at once, but rather a part of it. In this paper we consider the most common option of creating one new solution per generation [34], [35]. Rather than restricting the algorithm to the most commonly used uniform selection of two parents, we allow more flexibility to the choice of which parent selection mechanism is used. This approach was also followed by Sudholt [39] for the analysis of the (μ + 1)GA with diversity. In each generation the algorithm picks two par- ents from its population with replacement using a selection operator that satisfies the following condition: ∀x, y : f (x) ≥ f (y) =⇒ Pr(select x) ≥ Pr(select y). (1) The condition allows to use most of the popular parent selection mechanisms with replacement, such as fitness pro- portional selection, rank selection, or the one commonly used in steady state GAs, i.e., uniform selection [16]. Afterward, uniform crossover between the selected parents (i.e., each bit of the offspring is chosen from each parent with probabil- ity 1/2) provides an offspring to which standard bit mutation (i.e., each bit is flipped with probability c/n) is applied. The best μ among the μ + 1 solutions are carried over to the next generation and ties are broken uniformly at random. In this paper we use the standard convention for nam- ing steady state algorithms: the (μ + 1) EA differs from the (μ+1)GA by only selecting one individual per generation for reproduction and applying standard bit mutation to it (i.e., no crossover). Otherwise the two algorithms are identical. We will analyse Algorithm 1 for the well-studied ONEMAX function that is defined on bitstrings x ∈ {0, 1}n of length n and returns the number of 1-bits in the string: ONEMAX(x) =∑n i=1 xi. Here, xi is the ith bit of the solution x ∈ {0, 1}n. The ONEMAX benchmark function is very useful to assess the hillclimbing capabilities of a search heuristic. It displays the characteristic function optimization property that find- ing improving solutions becomes harder as the algorithm approaches the optimum. The problem is the same as that of identifying the hidden solution of the Mastermind game, where we assume for simplicity that the target string is the one of all 1-bits. Any other target string z ∈ {0, 1}n may also be used without loss of generality. If a bitstring is used, then ONEMAX is equivalent to Mastermind with two colors [15]. This can be generalized to many colors if alphabets of greater size are used [10], [11]. CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 723 Fig. 1. MC for fitness level i. IV. MARKOV CHAIN FRAMEWORK The recent analysis of the (μ + 1)GA for the JUMP func- tion shows that the interplay between crossover and mutation may create the diversity required for crossover to decrease the expected time to jump toward the optimum [5]. At the heart of the proof is the analysis of a random walk on the number of diverse individuals on the local optima of the function. The analysis delivers improved asymptotic expected runtimes of the (μ + 1)GA over mutation-only EAs only for population sizes μ = ω(1). This happens because, for larger population sizes, it takes more time to lose diversity once created, hence crossover has more time to exploit it. For ONEMAX the tech- nique delivers worse asymptotic bounds for population sizes μ = ω(1) and an O(n ln n) bound for constant population size. Hence, the techniques of [5] cannot be directly applied to show a speed-up of the (μ+1)GA over mutation-only EAs and a careful analysis of the leading constant in the runtime is necessary. In this section we present the MC framework that we will use to obtain the upper bounds on the runtime of the elitist steady state GAs. We will afterward discuss how this approach builds upon and generalizes Sudholt’s approach in [39]. The ONEMAX function has n+1 distinct fitness values. We divide the search space into the following canonical fitness levels [21], [31]: Li = { x ∈ {0, 1}n|ONEMAX(x) = i}. We say that a population is in fitness level i if and only if its best solution is in level Li. We use an MC for each fitness level i to represent the dif- ferent states the population may be in before reaching the next fitness level. The MC depicted in Fig. 1 distinguishes between states, where the population has no diversity (i.e., all individ- uals have the same genotype), hence crossover is ineffective, and states, where diversity is available to be exploited by the crossover operator. The MC has one absorbing state and two transient states. The first transient state S1,i is adopted if the whole population consists of copies of the same individual at level i (i.e., all the individuals have the same genotype). The second state S2,i is reached if the population consists of μ individuals in fitness level i and at least two individuals x and y are not identical. The second transient state S2,i differs from the state S1,i in having diversity which can be exploited by the crossover operator. S1,i and S2,i are mutually accessible from each other since the diversity can be introduced at state S1,i via mutation with some probability pd and can be lost at state S2,i with some relapse probability pr when copies of a solution take over the population. The absorbing state S3,i is reached when a solution at a bet- ter fitness level is found, an event that happens with probability pm when the population is at state S1,i and with probability pc when the population is at state S2,i. We pessimistically assume that in S2,i there is always only one single individual with a different genotype (i.e., with more than one distinct individual, pc would be higher and pr would be zero). Formally when S3,i is reached the population is no longer in level i because a bet- ter fitness level has been found. However, we will bound the expected time to reach the absorbing state for the next level only when the whole population has reached it (or a higher level). We do this because we assume that initially all the population is in level i when calculating the transition proba- bilities in the MC for each level i. This implies that bounding the expected times to reach the absorbing states of each fitness level is not sufficient to achieve an upper bound on the total expected runtime. When S3,i is reached for the first time, the population only has one individual at the next fitness level or in a higher one. Only when all the individuals have reached level i + 1 (i.e., either in state S1,i+1 or S2,i+1) may we use the MC to bound the runtime to overcome level i + 1. Then the MC can be applied, once per fitness level, to bound the total runtime until the optimum is reached. The main distinguishing aspect between the analysis presented herein and that of Sudholt [39] is that we take into account the possibility to transition back and forth (i.e., resp. with probability pd and pr) between states S1,i and S2,i as in standard steady state GAs (see Fig. 1). By enforcing that different genotypes on the same fitness level are kept in the population, the GA considered in [39] has a good probability of exploiting this diversity to recombine the different individ- uals. In particular, once the diversity is created it will never be lost, giving many opportunities for crossover to take advantage of it. A crucial aspect is that the probability of increasing the number of ones via crossover is much higher than the prob- ability of doing so via mutation once many 1-bits have been collected. Hence, by enforcing that once state S2,i is reached it cannot be left until a higher fitness level is found, Sudholt could prove that the resulting algorithm is faster compared to only using standard bit mutation. In the standard steady state GA, instead, once the diversity is created it may subsequently be lost before crossover successfully recombines the diverse individuals. This behavior is modeled in the MC by consider- ing the relapse probability pr. Hence, the algorithm spends less time in state S2,i compared to the GA with diversity enforce- ment. Nevertheless, it will still spend some optimization time in state S2,i, where it will have a higher probability of improv- ing its fitness by exploiting the diversity via crossover than when in state S1,i (i.e., no diversity), where it has to rely on mutation only. For this reason the algorithm will not be as fast for ONEMAX as the GA with enforced diversity but will still be faster than standard bit mutation-only EAs. An interesting consequence of the possibility of losing diversity, is that populations of size greater than 2 can be beneficial. In particular the diversity (i.e., state S2,i) may be completely lost in the next step when there is only one diverse individual left in the population. When this is the case, the relapse probability pr decreases with the population size μ 724 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 because the probability of selecting the diverse individual for removal is 1/μ. Furthermore, for population size μ = 2 there is a positive probability that diversity is lost in every gener- ation by either of the two individuals taking over, while for larger population sizes this is not the case. As a result our MC framework analysis will deliver a better upper bound for μ > 2 compared to the bound for μ = 2. This interesting insight into the utility of larger populations could not be seen in the analysis of [39] because there, once the diversity is achieved, it cannot be lost. We first concentrate on the expected absorbing time of the MC. Afterward we will calculate the takeover time before we can transition from one MC to the next. Since it is not easy to derive the exact transition probabilities, a runtime analysis is considerably simplified by using bounds on these probabil- ities. The main result of this section is stated in the following theorem that shows that we can use lower bounds on the tran- sition probabilities moving in the direction of the absorbing state (i.e., pm, pd, and pc) and an upper bound on the proba- bility of moving in the opposite direction to no diversity (i.e., pr) to derive an upper bound on the expected absorbing time of the MC. In particular, we define an MC M′ that uses the bounds on the exact transition probabilities and show that its expected absorbing time is greater than the absorbing time of the original chain. Hereafter, we drop the level index i for brevity and use E[T1] and E[T2] instead of E[T1,i] and E[T2,i] (similarly, S1 will denote state S1,i). Theorem 1: Consider two MCs M and M′ with the topology in Fig. 1, where the transition probabilities for M are pc, pm, pd, and pr, and the transition probabilities for M′ are p′c, p′m, p′d, and p′r. Let the expected absorbing time for M be E[T] and the expected absorbing time of M′ starting from state S1 be E[T ′1], respectively. If •pm < pc • p′d ≤ pd • p′r ≥ pr •p′c ≤ pc • p′m ≤ pm. Then E[T] ≤ E[T ′1 ] ≤ p ′ c + p′r p′cp′d + p′cp′m + p′mp′r + 1 p′c . We first concentrate on the second inequality in the state- ment of the theorem which will follow immediately from the next lemma. It allows us to obtain the expected absorbing time of the MC if the exact values for the transition probabilities are known. In particular, the lemma establishes the expected times E[T1] and E[T2] to reach the absorbing state, starting from the states S1 and S2, respectively. Lemma 1: The expected times E[T1] and E[T2] to reach the absorbing state, starting from state S1 and S2, respectively, are as follows: E[T1] = pc + pr + pdpcpd + pcpm + pmpr ≤ pc + pr pcpd + pcpm + pmpr + 1 pc E[T2] = pm + pr + pdpcpd + pcpm + pmpr . Before we prove the first inequality in the statement of Theorem 1, we will derive some helper propositions. We first show that as long as the transition probability of reaching the absorbing state from the state S2 (with diversity) is greater than that of reaching the absorbing state from the state with no diversity S1 (i.e., pm < pc), then the expected absorbing time from state S1 is at least as large as the expected time uncon- ditional of the starting point. This will allow us to achieve a correct upper bound on the runtime by just bounding the absorbing time from state S1. In particular, it allows us to pes- simistically assume that the algorithm starts each new fitness level in state S1 (i.e., there is no diversity in the population). Proposition 1: Consider an MC with the topology given in Fig. 1. Let E[T1] and E[T2] be the expected absorbing times starting from state S1 and S2, respectively. If pm < pc, then E[T1] > E[T2] and E[T], the unconditional expected absorbing time, satisfies E[T] ≤ E[T1]. In the following proposition we show that if we overesti- mate the probability of losing diversity and underestimate the probability of increasing it, then we achieve an upper bound on the expected absorbing time as long as pm < pc. Afterward, in Proposition 3 we show that an upper bound on the absorb- ing time is also achieved if the probabilities pc and pm are underestimated. Proposition 2: Consider two MCs M and M′ with the topol- ogy in Fig. 1, where the transition probabilities for M are pc, pm, pd, and pr, and the transition probabilities for M′ are pc, pm, p′d, and p′r. Let the expected absorbing times starting from state S1 for M and M′ be E[T1] and E[T ′1], respectively. If p′d ≤ pd, p′r ≥ pr, and pm < pc, then E[T1] ≤ E[T ′1]. Proposition 3: Consider two MCs M and M′ with the topol- ogy in Fig. 1, where the transition probabilities for M are pc, pm, pd, and pr, and the transition probabilities for M′ are p′c, p′m, pd, and pr. Let the expected absorbing times starting from state S1 for M and M′ be E[T1] and E[T ′1], respectively. If p′c ≤ pc and p′m ≤ pm, then E[T1] ≤ E[T ′1]. The propositions use that by lower bounding pd and upper bounding pr we overestimate the expected number of gen- erations the population is in state S1 compared to the time spent in state S2. Hence, if pc > pm we can safely use a lower bound for pd and an upper bound for pr and still obtain a valid upper bound on the runtime E[T1]. This is rigorously shown by combining together the results of the previous propositions to prove the main result i.e., Theorem 1. Proof of Theorem 1: Consider a third MC M∗ whose transi- tion probabilities are pc, pm, p′r, p′d. Let the absorbing time of M starting from state S1 be E[T1]. In order to prove the above statement we will prove the following sequence of inequalities: E[T] ≤ E[T1] ≤ E[T∗1 ] ≤ E[T ′1]. According to Proposition 1, E[T] ≤ E[T1] since pc > pm. According to Proposition 2 E[T1] ≤ E[T∗1 ] since p′d ≤ pd, p′r ≥ pr, and pc > pm. Finally, according to Proposition 3, p′c ≤ pc and p′m ≤ pm implies E[T∗1 ] ≤ E[T ′1] and our proof is completed by using Lemma 1 to show that the last inequality of the statement holds. The algorithm may skip some levels or a new fitness level may be found before the whole population has reached the current fitness level. Hence, by summing up the expected run- times to leave each of the n + 1 levels and the expected times for the whole population to takeover each level, we obtain an upper bound on the expected runtime. The next lemma estab- lishes an upper bound on the expected time it takes to move CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 725 from the absorbing state of the previous MC (S3,i) to any tran- sient state (S1,i+1 or S2,i+1) of the next MC. The lemma uses standard takeover arguments originally introduced in the first analysis of the (μ + 1) EA for ONEMAX [40]. To achieve a tight upper bound Witt had to carefully wait for only a frac- tion of the population to take over a level before the next level was discovered. In our case, the calculation of the tran- sition probabilities of the MC is actually simplified if we wait for the whole population to take over each level. Hence, in our analysis the takeover time calculations are more similar to the first analysis of the (μ+ 1) EA with and without diversity mechanisms to takeover the local optimum of TWOMAX [17]. Lemma 2: Let the best individual of the current population be in level i and all individuals be in level at least i−1. Then, the expected time for the whole population to be in level at least i is O(μ log μ). The lemma shows that, once a new fitness level is discov- ered for the first time, it takes at most O(μ log μ) generations until the whole population consists of individuals from the newly discovered fitness level or higher. While the absorption time of the MC might decrease with the population size, for too large population sizes, the upper bound on the expected total take over time will dominate the runtime. As a result the MC framework will deliver larger upper bounds on the run- time unless the expected time until the population takes over the fitness levels is asymptotically smaller than the expected absorption time of all MCs. For this reason, our results will require population sizes of μ = o(log n/ log log n), to allow all fitness levels to be taken over in expected o(n log n) time such that the latter time does not affect the leading constant of the total expected runtime. V. UPPER BOUND In this section we use the MC framework devised in Section IV to prove that the (μ+1) GA is faster than any standard bit mutation-only (μ + λ) EA. In order to satisfy the requirements of Theorem 1, we first show in Lemma 3 that pc > pm if the population is at one of the final n/(4c(1 + ec)) fitness levels. The lemma also shows that it is easy for the algorithm to reach such a fitness level. Afterward we bound the transition probabilities of the MC in Lemma 4. We conclude the section by stating and proving the main result, essentially by applying Theorem 1 with the transition probabilities calculated in Lemma 4. Lemma 3: For the (μ+1) GA with mutation rate c/n for any constant c, if the population is in any fitness level i > n − n/(4c(1 + ec)), then pc is always larger than pm. The expected time for the (μ+1) GA to sample a solution in fitness level n − n/(4c(1 + ec)) for the first time is O(nμ log μ). Proof: We consider the probability pc. If two individuals on the same fitness level with nonzero Hamming distance 2d are selected as parents with probability p′, then the probability that the crossover operator yields an improved solution is at least (see proof of [39, Th. 4]) Pr(X > d) = 1 2 (1 − Pr(X = d)) = 1 2 ( 1 − 2−2d ( 2d d )) ≥ 1/4 (2) where X is a binomial random variable with parameters 2d and 1/2 which represents the number of bit positions, where the parents differ and which are set to 1 in the offspring. With probability (1 − c/n)n no bits are flipped and the absorbing state is reached. If any individual is selected twice as parent, then the improvement can only be achieved by mutation (i.e., with probability pm) since crossover is ineffective. So pc > p′(1/4)(1 − c/n)n + (1 − p′)pm, hence if pm < p′(1/4)(1 − c/n)n + (1 − p′)pm it follows that pm < pc. The condition can be simplified to pm < (1/4)(1 − c/n)n with simple algebraic manipulation. For large enough n, (1−c/n)n ≥ 1/(1+ec) and the condition reduces to pm < 1/(4(1 + ec)). Since pm < (n − i)c/n is an upper bound on the transition probability (i.e., at least one of the zero bits has to flip to increase the ONEMAX value), the condition is satisfied for i ≥ n − n/(4c(1 + ec)). For any level i ≤ n − n/(4c(1 + ec)), after the take over of the level occurs in O(μ log μ) expected time, the probability of improving is at least (1) due to the linear number of 0-bits that can be flipped. Hence, we can upper bound the total number of generations necessary to reach fitness level i = n − n/(4c(1 + ec)) by O(nμ log μ). The lemma has shown that pc > pm holds after a linear number of fitness levels have been traversed. Now, we bound the transition probabilities of the MC. Lemma 4: Let μ ≥ 3. Then the transition probabilities pd, pc, pr, and pm are bounded as follows: pd ≥ μ (μ + 1) i(n − i)c2 n2(ec + O(1/n)) , pc ≥ μ − 1 2μ2(ec + O(1/n)) pr ≤ (μ − 1) ( 2μ − 1 + O(1/n)) 2ecμ2(μ + 1) , pm ≥ c(n − i) n(ec + O(1/n)) . Proof: We first bound the probability pd of transitioning from the state S1,i to the state S2,i. In order to introduce a new solution at level i with different genotype, it is sufficient that the mutation operator simultaneously flips one of the n− i 0-bits and one of the i 1-bits while not flipping any other bit. We point out that in S1,i, all individuals are identical, hence crossover is ineffective. Moreover, when the diverse solution is created, it should stay in the population, which occurs with probability μ/(μ+ 1) since one of the μ copies of the major- ity individual should be removed by selection instead of the offspring. So pd can be lower bounded as follows: pd ≥ μ (μ + 1) ic n (n − i)c n ( 1 − c n )n−2 . Using the inequality (1−1/x)x−1 ≥ 1/e ≥ (1−1/x)x, we now bound (1 − (c/n))n−2 as follows: ( 1 − c n )n−2 ≥ ( 1 − c n )n−1 ≥ ( 1 − c n )n = (( 1 − c n )(n/c)−1( 1 − c n ))c ≥ ( 1 e ( 1 − c n ))c ≥ 1 ec ( 1 − c 2 n ) where in the last step we used the Bernoulli’s inequality. We can further absorb the c2/n in an asymptotic O(1/n) 726 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 term as follows: ( 1 − c n )n−2 ≥ e−c − O(1/n) = 1 ec + O(1/n) . (3) The bound for pd is then pd ≥ μ (μ + 1) i(n − i)c2 n2(ec + O(1/n)) . We now consider pc. To transition from state S2,i to S3,i (i.e., pc) it is sufficient that two genotypically different individuals are selected as parents [i.e., with probability at least 2(μ − 1)/μ2], that crossover provides a better solution [i.e., with probability at least 1/4 according to (2)] and that mutation does not flip any bits [i.e., probability (1 − c/n)n ≥ 1/(ec + O(1/n)) according to (3)]. Therefore, the probability is pc ≥ 2μ − 1 μ2 1 4 ( 1 − c n )n ≥ μ − 1 2μ2(ec + O(1/n)) For calculating pr we pessimistically assume that the Hamming distance between the individuals in the population is 2 and that there is always only one individual with a differ- ent genotype. A population in state S2,i which has diversity, goes back to state S1,i when: 1) a majority individual is selected twice as parent [i.e., probability (μ− 1)2/μ2], mutation does not flip any bit [i.e., probability (1−c/n)n], and the minority individual is discarded [i.e., probability 1/(μ + 1)]; 2) two different individuals are selected as parents, crossover chooses either from the majority individual in both bit locations, where they differ (i.e., prob. 1/4) and mutation does not flip any bit [i.e., probability (1−c/n)n ≤ 1/ec] or mutation must flip at least one spe- cific bit [i.e., probability O(1/n)]. Finally, the minority individual is discarded [i.e., probability 1/(μ + 1)]; 3) a minority individual is chosen twice as parent and the mutation operator flips at least two specific bit positions [i.e., with probability O(1/n2)] and finally the minority individual is discarded [i.e., probability 1/(μ + 1)]. Hence, the probability of losing diversity is pr ≤ 1 μ + 1 [ (μ − 1)2 μ2 ( 1 − c n )n + 2 1 μ μ − 1 μ ( 1 4 ( 1 − c n )n + O(1/n) ) + O(1/n2) ] ≤ 2(μ − 1) 2 + (μ − 1) + 4ec(μ − 1)O(1/n) 2ecμ2(μ + 1) + O(1/n2) μ + 1 = (μ − 1) [ 2(μ − 1) + 1 + 4ecO(1/n)] + 2ecμ2O(1/n2) 2ecμ2(μ + 1) = (μ − 1) [ 2(μ − 1) + 1 + O(1/n)] + O(1/n2) 2ecμ2(μ + 1) ≤ (μ − 1)(2μ − 1 + O(1/n)) 2ecμ2(μ + 1) . In the last inequality we absorbed the O(1/n2) term into the O(1/n) term. The transition probability pm from state S1,i to state S3,i is the probability of improvement by mutation only, because crossover is ineffective at state S1,i. The number of 1-bits in the offspring increases if the mutation operator flips one of the (n − i) 0-bits [i.e., with probability c(n − i)/n] and does not flip any other bit [i.e., with probability (1 − c/n)n−1 ≥ (ec +O(1/n))−1 according to (3)]. Therefore, the lower bound on the probability pm is pm ≥ c(n − i) n(ec + O(1/n)) . We are finally ready to state our main result. Theorem 2: The expected runtime of the (μ+1) GA with μ ≥ 3 and mutation rate c/n for any constant c on ONEMAX is E[T] ≤ 3e cn log n c(3 + c) + O(nμ log μ). For μ = o(log n/ log log n), the bound reduces to E[T] ≤ 3 c(3 + c)e cn log n(1 + o(1)). Proof: We use Theorem 1 to bound E[Ti], the expected time until the (μ+1) GA creates an offspring at fitness level i + 1 or above given that all individuals in its initial popula- tion are at level i. The bounds on the transition probabilities established in Lemma 4 will be set as the exact transition probabilities of another MC, M′, with absorbing time larger than E[Ti] (by Theorem 1). Since Theorem 1 requires that pc > pm and Lemma 3 establishes that pc > pm holds for all fitness levels i > n − n/4c(1 + ec), we will only analyse E[Ti] for n − 1 ≥ i > n − n/(4c(1 + ec)). Recall that, by Lemma 3, level n − n/(4c(1 + ec)) is reached in expected O(nμ log μ) time. Consider the expected absorbing time E[T ′i ], of the MC M′ with transition probabilities p′d := μ (μ + 1) i(n − i)c2 n2(ec + O(1/n)) , p ′ c := μ − 1 2μ2(ec + O(1/n)) p′r := (μ − 1)(2μ − 1 + O(1/n)) 2ecμ2(μ + 1) , p ′ m := c(n − i) n(ec + O(1/n)) . According to Theorem 1 E[Ti] ≤ E[T ′i,1] ≤ p′c + p′r p′cp′d + p′cp′m + p′mp′r + 1 p′c . (4) We simplify the numerator and the denominator of the first term separately. The numerator is p′c + p′r = μ − 1 2μ2(ec + O(1/n)) + (μ − 1)(2μ − 1 + O(1/n)) 2ecμ2(μ + 1) ≤ μ − 1 2μ2ec ( 1 + 2μ − 1 + O(1/n) μ + 1 ) ≤ (μ − 1) [ 3μ + O(1/n)] 2μ2ec(μ + 1) . (5) CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 727 We can also rearrange the denominator D = p′cp′d+p′cp′m+p′mp′r as follows: D = p′c(p′d + p′m) + p′mp′r = (μ − 1) ( μi(n−i)c2 (μ+1)n2(ec+O(1/n)) + c(n−i)n(ec+O(1/n)) ) 2μ2(ec + O(1/n)) + c(n − i)(μ − 1)(2μ − 1 + O(1/n)) n(ec + O(1/n))2ecμ2(μ + 1) ≥ (μ − 1) ( μi(n−i)c2 (μ+1)n2 + c(n−i)n ) 2μ2 ( e2c + O(1/n)) + c(n − i)(μ − 1)(2μ − 1 + O(1/n)) n ( e2c + O(1/n))2μ2(μ + 1) ≥ c(n − i)(μ − 1) 2μ2 ( e2c + O(1/n)) · ( μic (μ + 1)n2 + 1 n + 2μ − 1 + O(1/n) n(μ + 1) ) ≥ c(n − i)(μ − 1) 2μ2 ( e2c + O(1/n)) · ( μic + (μ + 1)n + n(2μ − 1 + O(1/n)) (μ + 1)n2 ) ≥ c(n − i)(μ − 1) ( μic + n[3μ + O(1/n)]) 2μ2 ( e2c + O(1/n))(μ + 1)n2 . (6) Note that the term in square brackets is the same in both the numerator [i.e., (5)] and the denominator [i.e., (6)] including the small order terms in O(1/n) (i.e., they are identical). Let A = [3μ + c′/n], where c′ > 0 is the smallest constant that satisfies the O(1/n) in the upper bound on pr in Lemma 4. We can now put the numerator and denominator together and simplify the expression (p′c + p′r)/ ( p′c(p′d + p′m) + p′mp′r ) ≤ (μ − 1)A 2μ2ec(μ + 1) · 2μ2 ( e2c + O(1/n))(μ + 1)n2 c(n − i)(μ − 1)(μic + nA) ≤ A ( e2c + O(1/n))n2 ecc(n − i)(μic + nA) . By using that [(e2c + O(1/n))/(ec)] ≤ ec + O(1/n), we get ≤ A(e c + O(1/n))n2 c(n − i)(μic + nA) ≤ ec An 2 c(n − i)(μic + nA) + O(1/n) An2 c(n − i)(μic + nA) . The facts, n − i ≥ 1, A = (1), and μ, i, c > 0 imply that, nA+μic = (n) and (An2/c(n − i)(μic + nA)) = O(n). When multiplied by the O(1/n) term, we get ≤ e cn c(n − i) An (μic + nA) + O(1). By adding and subtracting μic to the numerator of (An/((μic + nA)]), we obtain ≤ e cn c(n − i) ( 1 − μic μic + nA ) + O(1). Note that the multiplier outside the brackets, (ecn)/(c(n−i)), is in the order of O(n/(n−i)). We now add and subtract μnc to the numerator of −(μic/μic + nA) to create a positive additive term in the order of O(μ(n − i)/n) = e cn c(n − i) ( 1 − μnc μic + nA + μ(n − i)c μic + nA ) + O(1) = e cn c(n − i) ( 1 − μnc μic + nA ) + e cn c(n − i) μ(n − i)c μic + nA + O(1) = e cn c(n − i) ( 1 − μnc μic + nA ) + O(μ). Since p′c = (1/μ), we can similarly absorb 1/p′c into the O(μ) term. After the addition of the remaining term 1/p′c from (4), we obtain a valid upper bound on E[Ti] E[Ti] ≤ p ′ c + p′r p′cp′d + p′cp′m + p′mp′r + 1 p′c ≤ e cn c(n − i) ( 1 − μnc μic + nA ) + O(μ). In order to bound the negative term, we will rearrange its denominator (i.e., nA + μic) n [ 3μ + c′/n] + μ = 3μn + c′ + μic = 3μn + c′ − (n − i)μc + μnc < μn(3 + c) + c′ where the second equality is obtained by adding and subtract- ing μnc. Altogether E[Ti] ≤ e cn c(n − i) ( 1 − μnc μn(3 + c) + c′ ) + O(μ) = e cn c(n − i) ( 1 − μnc + c ′ c 3+c − c′ c3+c μn(3 + c) + c′ ) + O(μ) = e cn c(n − i) ( 1 − c 3 + c + c′ c3+c μn(3 + c) + c′ ) + O(μ) = e cn c(n − i) ( 1 − c 3 + c + O(1/n) ) + O(μ) = e cn c(n − i) 3 3 + c + O(μ). If we add the expected time to take over each fitness level from Lemma 2 and sum over all fitness levels the upper bound on the runtime is n∑ i=n−n/(4c(1+ec)) ( ecn c(n − i) 3 3 + c + O(μ) + O(μ log μ) ) ≤ n∑ i=0 ( ecn c(n − i) 3 3 + c + O(μ log μ) ) ≤ 3e cn log n c(3 + c) + O(nμ log μ) ≤ 3ecn log n c(3 + c) (1 + o(1)) where in the last inequality we use μ = o(log n/ log log n) to prove the second statement of the theorem. The second statement of the theorem provides an upper bound of (3/4)en log n for the standard mutation rate 1/n (i.e., c = 1) and μ = o(log n/ log log n). The upper bound is mini- mized for c = (1/2)(√13 − 1). Hence, the best upper bound 728 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 is delivered for a mutation rate of about 1.3/n. The resulting leading term of the upper bound is E[T] ≤ 6e 1 2 (√ 13−1 ) n log n (√ 13 − 1 )( 1 2 (√ 13 − 1 ) + 3 ) ≈ 1.97n log n. We point out that Theorem 2 holds for any μ ≥ 3. Our framework provides a higher upper bound when μ = 2 com- pared to larger values of μ. The main difference lies in the probability pr as shown in the following lemma. Lemma 5: The transition probabilities pm, pr, pc, and pd for the (2+1) GA, with mutation rate c/n and c constant, are bounded as follows: pd ≥ 23 i(n − i)c2 n2(ec + O(1/n)) , pc ≥ 1 8(ec + O(1/n)) pr ≤ 524ec + O(1/n), pm ≥ c(n − i) (ec + O(1/n))n . The upper bound on pr from Lemma 4 is 1/(8ec), which is smaller than the bound we have just found. This is due to the assumptions in the lemma that there can be only one genotype in the population at a given time which can take over the population in the next iteration. However, when μ = 2, either individual can take over the population in the next iteration. This larger upper bound on pr for μ = 2 leads to a larger upper bound on the runtime of E[T] ≤ [4/(c + 4)](ecn log n/c)(1 + o(1)) for the (2+1) GA. The calculations are omitted as they are the same as those of the proof of Theorem 2, where pr ≥ 5/(24ec) + O(1/n) is used and μ is set to 2. VI. LOWER BOUND In the previous section we provided a higher upper bound for the (2+1) GA compared to the (μ+1) GA with population size greater than 2 and μ = o(log n/ log log n). To rigorously prove that the (2+1) GA is indeed slower, we require a lower bound on the runtime of the algorithm that is higher than the upper bound provided in the previous section for the (μ + 1) GA (μ ≥ 3). Since providing lower bounds on the runtime is a noto- riously hard task, we will follow a strategy previously used by Sudholt [39] and analyse a version of the (μ+1) GA with greedy parent selection and greedy crossover (i.e., Algorithm 2) in the sense that: 1) parents are selected uniformly at random only among the solutions from the highest fitness level (greedy selection); 2) if the offspring has the same fitness as its parents and its Hamming distance to any individual with equal fitness in the population is larger than 2, then the algorithm automatically performs an OR operation between the offspring and the individual with the largest Hamming distance and fitness, breaking ties arbitrarily, and adds the resulting offspring to the population, i.e., we pes- simistically allow it to skip as many fitness levels as possible (semi-greedy crossover). The greedy selection allows us to ignore the improvements that occur via crossover between solutions from different fit- ness levels. Thus, the crossover is only beneficial when there Algorithm 2: (2 + 1)S GA 1 P ← μ individuals, uniformly at random from {0, 1}n; 2 repeat 3 Choose x, y ∈ P uniformly at random among P∗, the individuals with the current best fitness f ∗; 4 z ← Uniform crossover with probability 1/2 (x, y); 5 Flip each bit in z with probability c/n; 6 If f (z) = f ∗ and max w∈P∗(HD(w, z)) > 2 then z ← z ∨ arg max w∈P∗ (HD(w, z)) ; 7 P ← P ∪ {z}; 8 Choose one element from P with lowest fitness and remove it from P, breaking ties at random; 9 until termination condition satisfied; are at least two different genotypes in the population at the highest fitness level discovered so far. The difference with the algorithm analyzed by Sudholt [39] is that the (2 + 1)S GA we consider does not use any diversity mechanism and it does not automatically crossover correctly when the Hamming dis- tance between parents is exactly 2. As a result, there still is a nonzero probability of losing diversity before a successful crossover occurs. The crossover operator of the (2+1)S GA is less greedy than the one analyzed in [39] (i.e., there crossover is automatically successful also when the Hamming distance between the parents is 2). We point out that the upper bounds on the runtime derived in the previous section also hold for the greedy (2 + 1)S GA. The MC structure of Fig. 1 is still representative of the states that the algorithm can be in. When there is no diversity in the population, either an improvement via mutation occurs or diversity is introduced into the population by the muta- tion operator. When diversity is present, both crossover and mutation can reach a higher fitness level while there is also a probability that the population will lose diversity by repli- cating one of the existing genotypes. With a population size of two the diversity can be lost by creating a copy of either solution and removing the other one from the population dur- ing environmental selection (i.e., line 8 in Algorithm 2). With population sizes greater than two, the loss of diversity can only occur when the majority genotype (i.e., the genotype with most copies in the population) is replicated. Building upon this we will show that the asymptotic performance of the (2+1)S GA for ONEMAX cannot be better than that of the (μ + 1) GAs for μ > 2. Like in [39] for our analysis we will apply the fitness level method for lower bounds proposed by Sudholt [38]. Due to the greedy crossover and the greedy parent selection used in [39], the population could be represented by the trajec- tory of a single individual. If an offspring with lower fitness was added to the population, then the greedy parent selection never chose it. If instead, a solution with equally high fitness and different genotype was created, then the algorithm imme- diately reduced the population to a single individual that is the best possible outcome from crossing over the two genotypes. The main difference between the following analysis and that CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 729 of [39] is that we want to take into account the possibility that the gained diversity may be lost before crossover exploits it. To this end, when individuals of equal fitness and Hamming distance 2 are created, crossover only exploits this success- fully (i.e., goes to the next fitness level) with the conditional probability that crossover is successful before the diversity is lost. Otherwise, the diversity is lost. Only when individu- als of Hamming distance larger than 2 are created, we allow crossover to immediately provide the best possible outcome as in [39]. Now, we can state the main result of this section. Theorem 3: The expected runtime of the (2 + 1)S GA with mutation probability p = c/n for any constant c on ONEMAX is no less than 3ec c ( 3 + max 1≤k≤n ( (np)k (k!)2 ))n log n − O(n log log n). Note that for c ≤ 4, max 1≤k≤n(((np) k/(k!)2)) ≤ pn = c. Since E[T] ≥ en log n for c ≥ 3, for the purpose of finding the mutation rate that minimizes the lower bound, we can reduce the statement of the theorem to 3ecn log n c(3 + c) − O(n log log n). The theorem provides a lower bound of (3/4)en log n − O(n log log n) for the standard mutation rate 1/n (i.e., c = 1). The lower bound is minimized for c = (1/2)(√13 − 1). Hence, the smallest lower bound is delivered for a mutation rate of about 1.3/n. The resulting lower bound is E[T] ≥ 6e 1 2 (√ 13−1 ) n log n (√ 13 − 1 )( 1 2 (√ 13 − 1 ) + 3 ) − O(n log log n) ≈ 1.97n log n − O(n log log n). Since the lower bound for the (2 + 1)S GA matches the upper bound for the (μ+1) GA with μ > 2, the theorem proves that, under greedy selection and semi-greedy crossover, popu- lations of size 2 cannot be faster than larger population sizes up to μ = o(log n/ log log n). In the following section we give experimental evidence that the greedy algorithms are faster than the standard (2+1) GA, thus suggesting that the same conclusions hold also for the standard non greedy algorithms. VII. EXPERIMENTS The theoretical results presented in the previous sections pose some new interesting questions. On one hand, the the- ory suggests that population sizes greater than 2 benefit the (μ+1) GA for hillclimbing the ONEMAX function. On the other hand, the best runtime bounds are obtained for a mutation rate of approximately 1.3/n, suggesting that higher mutation rates than the standard 1/n rate may improve the performance of the (μ+1) GA. In this section we present the outcome of some experimental investigations to shed further light on these questions. In particular, we will investigate the effects of the population size and mutation rate on the run- time of the steady-state GA for ONEMAX and compare its Fig. 2. Average runtime over 1000 independent runs versus problem size n. runtime against other GAs that have been proved to be faster than mutation-only EAs in the literature. We start with an overview of the performance of the algo- rithms. In Fig. 2, we plot the average runtime over 1000 independent runs of the (μ+1) GA with μ = 2 and μ = 5 (with uniform parent selection and standard 1/n mutation rate) for exponentially increasing problem sizes and compare it against the fastest standard bit mutation-only EA with static mutation rate [i.e., the (1 + 1) EA with 1/n mutation rate]. While the algorithm using μ = 5 outperforms the μ = 2 version, they are both faster than the (1+1) EA already for small problem sizes. We also compare the algorithms against the (2+1) GA investigated by Sudholt [39], where diversity is enforced by the environmental selection always preferring distinct individuals of equal fitness—the same GA variant that was first proposed and analyzed in [22]. We run the algorithm both with standard mutation rate 1/n and with the optimal mutation rate (1 + √5)/(2n). Obviously, when diversity is enforced, the algorithms are faster. Finally, we also compare the algorithms against the (1 + (λ, λ)) GA with self-adjusting population sizes and Sudholt’s (2 + 1) GA as they were com- pared previously in [9]. Note that in [9, Fig. 8] Sudholt’s algorithm was implemented with a very greedy parent selec- tion operator that always prefers distinct individuals on the highest fitness level for reproduction. In order to decompose the effects of the greedy parent selec- tion, greedy crossover, and the use of diversity, we conducted further experiments shown in Fig. 3. Here, we see that it is indeed the enforced diversity that creates the fundamen- tal performance difference. Moreover, the results show that the greedy selection/greedy crossover GA is slightly faster than the greedy parent selection GA and that greedy parent selection is slightly faster than standard selection. Overall, the figure suggests that the lower bound presented in Theorem 3 is also valid for the standard (2+1) GA with uniform par- ent selection (i.e., no greediness). In Fig. 3, it can be noted that the performance difference between the GA with greedy crossover and greedy parent selection analyzed in [39] and the (2+1) GA with enforced diversity and without greedy 730 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 Fig. 3. Comparison between standard selection, greedy selection, and greedy selection + greedy crossover GAs. The runtime is averaged over 1000 independent runs. Fig. 4. Average runtime gain of the (μ+1) GA versus the (2+1) GA for different population sizes, errorbars show the standard deviation normalized by the average runtime for μ = 2. crossover is more pronounced than the performance difference between the standard (2+1) GA analyzed in Section V and the (2 + 1)S GA which was analyzed in Section VI. The reason behind the difference in discrepancies is that the (2 + 1)S GA does not implement the greedy crossover operator when the Hamming distance is 2. We speculate that cases, where the Hamming distance is just enough for the crossover to exploit it occur much more frequently than the cases, where a larger Hamming distance is present. As a result, the performance of the (2 + 1)S GA does not deviate much from the standard algorithm. Table I in the supplementary material presents the mean and standard deviation of the runtimes of the algorithms depicted in Figs. 2 and 3 over 1000 independent runs. Now, we investigate the effects of the population size on the (μ+1) GA. We perform 1000 independent runs of the (μ+1) GA with uniform parent selection and standard muta- tion rate 1/n for increasing population sizes up to μ = 16. Fig. 5. Average runtime gain of the (μ+1) GA versus the (1 + 1) EA for different population sizes, errorbars show the standard deviation normalized by the average runtime of the (1 + 1) EA. Fig. 6. Average runtime gain of the (5 + 1) GA for various mutation rates versus the standard 1/n mutation rate, errorbars show the standard deviation normalized by the average runtime for 1/n mutation rate. In Fig. 4 we present average runtimes divided by the runtime of the (2 + 1) GA and in Fig. 5 normalized against the run- time of the (1+1) EA. In both figures, we see that the runtime improves for μ larger than 2 and after reaching its lowest value increases again with the population size. It is not clear whether there is a constant optimal static value for μ around 4 or 5. The experiments, however, do not rule out the possi- bility that the optimal static population size increases slowly with the problem size (i.e., μ = 3 for n = 256, μ = 4 for n = 4096, and μ = 5 for n = 16 384). On the other hand, we clearly see that as the problem size increases we get a larger improvement on the runtime. This indicates that the harder is the problem, more useful are the populations. In particular, in Fig. 5 we see that the theoretical asymptotic gain of 25% with respect to the runtime of the (1 + 1) EA is approached more and more closely as n increases. For the considered problem sizes, the (μ+1) GA is faster than the (1 + 1) EA for all tested CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 731 values of μ. However, to see the runtime improvement of the (μ+1) GA against the (2+1) GA for μ > 15 the experiments (Fig. 4) suggest that greater problem sizes would need to be used. Finally, we investigate the effect of the mutation rate on the runtime. Based on our previous experiments we set the popu- lation size to the best seen value of μ = 5 and perform 1000 independent runs for each c value ranging from 0.9 to 1.9. In Fig. 6, we see that even though the mutation rate c ≈ 1.3 minimizes the upper bound we proved on the runtime, setting a larger mutation rate of 1.6 further decreases the runtime. VIII. CONCLUSION The question of whether GAs can hillclimb faster than mutation-only algorithms is a long standing one. On one hand, in his pioneering book, Rechenberg had given preliminary experimental evidence that crossover may speed up the run- time of population-based EAs for generalized ONEMAX [33]. On the other hand, further experiments suggested that GAs were slower hillclimbers than the (1+1) EA [20], [28]. In recent years it has been rigorously shown that crossover and mutation can outperform algorithms using only mutation. First, a new theory-driven GA called (1+(λ, λ)) GA has been shown to be asymptotically faster for hillclimbing the ONEMAX func- tion than any unbiased mutation-only EA [9]. Second, it has been shown how standard (μ+λ) GAs are twice as fast as their standard bit mutation-only counterparts for ONEMAX as long as diversity is enforced through environmental selection [39]. In this paper, we have rigorously proven that standard steady-state GAs with μ ≥ 3 and μ = o(log n/ log log n) are at least 25% faster than all unbiased standard bit mutation- based EAs with static mutation rate for ONEMAX even if no diversity is enforced. The MC framework we used to achieve the upper bounds on the runtimes should be general enough to allow future analyses of more complicated GAs, for instance with greater offspring population sizes or more sophisticated crossover operators. A limitation of the approach is that it applies to classes of problems that have plateaus of equal fit- ness. Hence, for functions, where each genotype has a different fitness value our approach would not apply. An open ques- tion is whether the limitation is inherent to our framework or whether it is crossover that would not help steady-state EAs at all on such fitness landscapes. Our results also explain that populations are useful not only in the exploration phase of the optimization, but also to improve exploitation during the hillclimbing phases. In partic- ular, larger population sizes increase the probability of creating and maintaining diversity in the population. This diversity can then be exploited by the crossover operator. Recent results had already shown how the interplay between mutation and crossover may allow the emergence of diversity, which in turn allows to escape plateaus of local optima more effi- ciently compared to relying on mutation alone [5]. This paper sheds further light on the picture by showing that popula- tions, crossover, and mutation together, not only may escape optima more efficiently, but may be more effective also in the exploitation phase. Another additional insight gained from the analysis is that the standard mutation rate 1/n may not be optimal for the (μ+1) GA on ONEMAX. This result is also in line with, and nicely complements, other recent findings concerning steady state GAs. For escaping plateaus of local optima it has been recently shown that increasing the mutation rate above the standard 1/n rate leads to smaller upper bounds on escaping times [5]. However, when jumping large low-fitness valleys, mutation rates of about 2.6/n seem to be optimal static rates (see the experiment section in [3] and [4]). For ONEMAX lower mutation rates seem to be optimal static rates, but still considerably larger than the standard 1/n rate. New interesting questions for further work have spawned. Concerning population sizes an open problem is to rigorously prove whether the optimal size grows with the problem size and at what rate. Also determining the optimal mutation rate remains an open problem. While our theoretical analysis deliv- ers the best upper bound on the runtime with a mutation rate of about 1.3/n, experiments suggest a larger optimal muta- tion rate. Interestingly, this experimental rate is very similar to the optimal mutation rate (i.e., approximately 1.618/n) of the (μ+ 1) GA with enforced diversity proven in [39]. The bene- fits of higher than standard mutation rates in elitist algorithms is a topic that is gaining increasing interest [2], [14], [30]. Further improvements may be achieved by dynamically adapting the population size and mutation rate during the run. Advantages, in this sense, have been shown for the (1 + (λ, λ)) GA by adapting the population size [7] and for single individual algorithms by adapting the mutation rate [12], [26]. Generalizing the results to larger classes of hill- climbing problems is intriguing. In particular, proving whether speed ups of the (μ+1) GA compared to the (1 + 1) EA are also achieved for Royal Road functions would give a definitive answer to a long standing question [28]. Analyses for larger problem classes, such as linear functions and classical combi- natorial optimization problems would lead to further insights. Yet another natural question is how the (μ+1) GA hillclimbing capabilities compare to (μ + λ) GAs and generational GAs. REFERENCES [1] T. Bäck, Evolutionary Algorithms in Theory and Practice. Oxford, U.K.: Oxford Univ. Press, 1996. [2] D. Corus, P. S. Oliveto, and D. Yazdani, “On the runtime analysis of the opt-IA artificial immune system,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), Berlin, Germany, 2017, pp. 83–90. [3] D.-C. Dang et al., “Escaping local optima using crossover with emergent or reinforced diversity,” ArXiv:1608.03123, Aug. 2016. [4] D.-C. Dang et al., “Escaping local optima using crossover with emer- gent diversity,” IEEE Trans. Evol. Comput., to be published, doi: 10.1109/TEVC.2017.2724201. [5] D.-C. Dang et al., “Emergence of diversity and its benefits for crossover in genetic algorithms,” in Proc. Int. Conf. Parallel Problem Solving Nat. (PPSN XIV), 2016, pp. 890–900. [6] D.-C. Dang et al., “Escaping local optima with diversity mechanisms and crossover,” in Proc. Genet. Evol. Comput. Conf. (GECCO), Denver, CO, USA, 2016, pp. 645–652. [7] B. Doerr and C. Doerr, “Optimal parameter choices through self- adjustment: Applying the 1/5-th rule in discrete settings,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), 2015, pp. 1335–1342. [8] B. Doerr and C. Doerr, “A tight runtime analysis of the (1+(λ, λ)) genetic algorithm on onemax,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), 2015, pp. 1423–1430. 732 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018 [9] B. Doerr, C. Doerr, and F. Ebel, “From black-box complexity to design- ing new genetic algorithms,” Theor. Comput. Sci., vol. 567, pp. 87–104, Feb. 2015. [10] B. Doerr, C. Doerr, and T. Köetzing, “The right mutation strength for multi-valued decision variables,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), Denver, CO, USA, 2016, pp. 1115–1122. [11] B. Doerr, C. Doerr, R. Spöhel, and H. Thomas, “Playing mastermind with many colors,” J. ACM, vol. 63, no. 5, pp. 1–23, 2016. [12] B. Doerr, C. Doerr, and J. Yang, “k-Bit mutation with self-adjusting k outperforms standard bit mutation,” in Proc. Int. Conf. Parallel Problem Solving Nat. (PPSN XIV), 2016, pp. 824–834. [13] B. Doerr, E. Happ, and C. Klein, “Crossover can provably be useful in evolutionary computation,” Theor. Comput. Sci., vol. 425, pp. 17–33, Mar. 2012. [14] B. Doerr, H. P. Le, R. Makhmara, and T. D. Nguyen, “Fast genetic algo- rithms,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), Berlin, Germany, 2017, pp. 777–784. [15] B. Doerr and C. Winzen, “Playing mastermind with constant-size memory,” Theory Comput. Syst., vol. 55, no. 4, pp. 658–684, 2014. [16] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing. Berlin, Germany: Springer-Verlag, 2003. [17] T. Friedrich, P. S. Oliveto, D. Sudholt, and C. Witt, “Analysis of diversity-preserving mechanisms for global exploration,” Evol. Comput., vol. 17, no. 4, pp. 455–476, 2009. [18] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA, USA: Addison-Wesley, 1989. [19] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, MI, USA: Univ. Michigan Press, 1975. [20] T. Jansen and I. Wegener, “Real royal road functions—Where crossover provably is essential,” Discr. Appl. Math., vol. 149, nos. 1–3, pp. 111–125, 2005. [21] T. Jansen, Analyzing Evolutionary Algorithms: The Computer Science Perspective. Berlin, Germany: Springer-Verlag, 2013. [22] T. Jansen and I. Wegener, “The analysis of evolutionary algorithms— A proof that crossover really can help,” Algorithmica, vol. 34, no. 1, pp. 47–66, 2002. [23] T. Kötzing, D. Sudholt, and M. Theile, “How crossover helps in pseudo-Boolean optimization,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), Dublin, Ireland, 2011, pp. 989–996. [24] P. K. Lehre and C. Witt, “Black-box search by unbiased variation,” Algorithmica, vol. 64, no. 4, pp. 623–642, 2012. [25] P. K. Lehre and X. Yao, “Crossover can be constructive when com- puting unique input–output sequences,” Soft Comput., vol. 15, no. 9, pp. 1675–1687, 2011. [26] A. Lissovoi, P. S. Oliveto, and J. A. Warwicker, “On the runtime analysis of generalised selection hyper-heuristics for pseudo-Boolean optimi- sation,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), 2017, pp. 849–856. [27] M. Mitchell, J. H. Holland, and S. Forrest, “Relative building-block fitness and the building block hypothesis,” in Proc. Found. Genet. Algorithms (FOGA), vol. 2. 1993, pp. 109–126. [28] M. Mitchell, J. Holland, and S. Forrest, “When will a genetic algo- rithm outperform hill climbing?” in Neural Information Processing Systems (NIPS 6). San Mateo, CA, USA: Morgan Kaufmann, 1994, pp. 51–58. [29] F. Neumann, P. S. Oliveto, G. Rudolph, and D. Sudholt, “On the effec- tiveness of crossover for migration in parallel evolutionary algorithms,” in Proc. Genet. Evol. Comput. Conf. (GECCO), 2011, pp. 1587–1594. [30] P. S. Oliveto, P. K. Lehre, and F. Neumann, “Theoretical analysis of rank-based mutation—Combining exploration and exploitation,” in Proc. IEEE Congr. Evol. Comput. (CEC), Trondheim, Norway, 2009, pp. 1455–1462. [31] P. S. Oliveto and X. Yao, “Runtime analysis of evolutionary algorithms for discrete optimization,” in Theory of Randomized Search Heuristics: Foundations and Recent Developments, B. Doerr and A. Auger, Eds. Hackensack, NJ, USA: World Sci., 2011, pp. 21–52. [32] C. Qian, Y. Yu, and Z. Zhou, “An analysis on recombination in multi-objective evolutionary optimization,” Artif. Intell., vol. 204, pp. 99–119, Nov. 2013. [33] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologishen Evolution. Stuttgart, Germany: Friedrich Frommann Verlag, 1973. [34] J. E. Rowe, “Genetic algorithms,” in Handbook of Computational Intelligence, W. Pedrycz and J. Kacprzyk, Eds. Berlin, Germany: Springer-Verlag, 2011, pp. 825–844. [35] J. Sarma and K. D. Jong, “Generation gap methods,” in Handbook of Evolutionary Computation, T. Back, D. B. Fogel, and Z. Michalewicz, Eds. Bristol, U.K.: Inst. Phys., 1997, ch. C 2.7. [36] T. Storch and I. Wegener, “Real royal road functions for constant population size,” Theor. Comput. Sci., vol. 320, no. 1, pp. 123–134, 2004. [37] D. Sudholt, “Crossover is provably essential for the Ising model on trees,” in Proc. Genet. Evol. Comput. Conf. (GECCO), Washington, DC, USA, 2005, pp. 1161–1167. [38] D. Sudholt, “A new method for lower bounds on the running time of evolutionary algorithms,” IEEE Trans. Evol. Comput., vol. 17, no. 3, pp. 418–435, Jun. 2013. [39] D. Sudholt, “How crossover speeds up building-block assembly in genetic algorithms,” Evol. Comput., vol. 25, no. 2, pp. 237–274, 2017. [40] C. Witt, “Runtime analysis of the (μ+1) EA on simple pseudo-Boolean functions,” Evol. Comput., vol. 14, no. 1, pp. 65–86, 2006. [41] C. Witt, “Tight bounds on the optimization time of a randomized search heuristic on linear functions,” Combinatorics Probab. Comput., vol. 22, no. 2, pp. 294–318, 2013. Dogan Corus (M’17) received the B.S. and M.S. degrees in industrial engineering from Koç University, Istanbul, Turkey and the Ph.D. degree in computer science from the University of Nottingham, Nottingham, U.K. He is a Research Associate with the University of Sheffield, Sheffield, U.K. His current research interests include the runtime analysis of bio-inspired algorithms, in particular, population-based algo- rithms and genetic algorithms. Pietro S. Oliveto (S’07–M’09–SM’16) received the Laurea degree in computer science from the University of Catania, Catania, Italy in 2005 and the Ph.D. degree from the University of Birmingham, Birmingham, U.K., in 2009. He is a Senior Lecturer and an EPSRC Early Career Fellow with the University of Sheffield, Sheffield, U.K. He has been an EPSRC Ph.D.+ Fellow, from 2009 to 2010, and an EPSRC Post- Doctoral Fellow, from 2010 to 2013, with the University of Birmingham and a Vice-Chancellor’s Fellow with the University of Sheffield, from 2013 to 2016. His current research interests include the performance analysis of bio-inspired compu- tation techniques including evolutionary algorithms, genetic programming, artificial immune systems and hyperheuristics. Dr. Oliveto was a recipient of the best paper awards at the Genetic and Evolutionary Computation Conference in 2008 and 2014 and the International Conference on Artificial Immune Systems in 2011. He is a part of the Steering Committee of the Annual Workshop on Theory of Randomized Search Heuristics, an Associate Editor of the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, the Chair of the IEEE CIS Task Force on Theoretical Foundations of Bio-Inspired Computation and a member of the EPSRC Peer Review College.
欢迎咨询51作业君