程序辅导案例 > Program >

程序代写案例-OCTOBER 2018

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

720 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
Standard Steady State Genetic Algorithms Can
Hillclimb Faster Than Mutation-Only
Evolutionary Algorithms
Dogan Corus, Member, IEEE, and Pietro S. Oliveto , Senior Member, IEEE
Abstract—Explaining to what extent the real power of genetic
algorithms (GAs) lies in the ability of crossover to recombine indi-
viduals into higher quality solutions is an important problem in
evolutionary computation. In this paper we show how the inter-
play between mutation and crossover can make GAs hillclimb
faster than their mutation-only counterparts. We devise a Markov
chain framework that allows to rigorously prove an upper bound
on the runtime of standard steady state GAs to hillclimb the
ONEMAX function. The bound establishes that the steady-state
GAs are 25% faster than all standard bit mutation-only evolu-
tionary algorithms with static mutation rate up to lower order
terms for moderate population sizes. The analysis also suggests
that larger populations may be faster than populations of size 2.
We present a lower bound for a greedy (2 + 1) GA that matches
the upper bound for populations larger than 2, rigorously proving
that two individuals cannot outperform larger population sizes
under greedy selection and greedy crossover up to lower order
terms. In complementary experiments the best population size
is greater than 2 and the greedy GAs are faster than standard
ones, further suggesting that the derived lower bound also holds
for the standard steady state (2 + 1) GA.
Index Terms—Algorithms design and analysis, genetic algo-
rithms (GAs), Markov processes, stochastic processes.
I. INTRODUCTION
GENETIC algorithms (GAs) rely on a population of indi-viduals that simultaneously explore the search space.
The main distinguishing features of GAs from other random-
ized search heuristics is their use of a population and crossover
to generate new solutions. Rather than slightly modifying the
current best solution as in more traditional heuristics, the idea
behind GAs is that new solutions are generated by recom-
bining individuals of the current population (i.e., crossover).
Such individuals are selected to reproduce probabilistically
according to their fitness (i.e., reproduction). Occasionally,
Manuscript received November 15, 2016; revised April 13, 2017,
July 7, 2017, and August 3, 2017; accepted August 5, 2017. Date of publi-
cation September 26, 2017; date of current version September 28, 2018. This
work was supported by EPSRC under Grant EP/M004252/1. (Corresponding
author: Pietro S. Oliveto.)
The authors are with the Rigorous Research Team, Algorithms Group,
Department of Computer Science, University of Sheffield, Sheffield S1 4DP,
U.K. (e-mail: [email protected]; [email protected]).
This paper has supplementary downloadable multimedia material available
at http://ieeexplore.ieee.org provided by the authors.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TEVC.2017.2745715
random mutations may slightly modify the offspring produced
by crossover. The original motivation behind these mutations
is to avoid that some genetic material may be lost forever, thus
allowing to avoid premature convergence [16], [18]. For these
reasons the GA community traditionally regards crossover as
the main search operator while mutation is considered a “back-
ground operator” [1], [16], [19] or a “secondary mechanism
of genetic adaptation” [18].
Explaining when and why GAs are effective has proved to
be a nontrivial task. Schema theory and its resulting building
block hypothesis [18] were devised to explain such working
principles. However, these theories did not allow to rigor-
ously characterize the behavior and performance of GAs.
The hypothesis was disputed when a class of functions (i.e.,
Royal Road), thought to be ideal for GAs, was designed and
experiments revealed that the simple (1+1) EA was more
efficient [20], [27].
Runtime analysis approaches have provided rigorous proofs
that crossover may indeed speed up the evolutionary process
of GAs in ideal conditions (i.e., if sufficient diversity is avail-
able in the population). The JUMP function was introduced by
Jansen and Wegener [22] as a first example, where crossover
considerably improves the expected runtime compared to
mutation-only evolutionary algorithms (EAs). The proof
required an unrealistically small crossover probability to allow
mutation alone to create the necessary population diversity
for the crossover operator to then escape the local optimum.
Dang et al. [6] recently showed that the sufficient diversity,
and even faster upper bounds on the runtime for not too large
jump gaps, can be achieved also for realistic crossover proba-
bilities by using diversity mechanisms. Further examples that
show the effectiveness of crossover have been given for both
artificially constructed functions and standard combinatorial
optimization problems (see the next section for an overview).
Excellent hillclimbing performance of crossover-based GAs
has been also proved. Doerr et al. [8], [9] proposed a
(1+(λ, λ)) GA which optimizes the ONEMAX function in
(n
√
log log log(n)/ log log(n)) fitness evaluations (i.e., run-
time). Since the unbiased unary black box complexity of
ONEMAX is (n log n) [24], the algorithm is asymptotically
faster than any unbiased mutation-only EA. Furthermore, the
algorithm runs in linear time when the population size is self-
adapted throughout the run [7]. Through this paper, though,
it is hard to derive conclusions on the working principles of
standard GAs because these are very different compared to
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 721
the (1 + (λ, λ)) GA in several aspects. In particular, the (1
+ (λ, λ)) GA was especially designed to use crossover as a
repair mechanism that follows the creation of new solutions
via high mutation rates. This makes the algorithm work in a
considerably different way compared to traditional GAs.
More traditional GAs have been analyzed by Sudholt [39].
Concerning ONEMAX, he shows how (μ+λ) GAs are twice
as fast as their standard bit mutation-only counterparts. As a
consequence, he showed an upper bound of (e/2)n log n(1 +
o(1)) function evaluations for a (2 + 1) GA versus the
en log n(1−o(1)) function evaluations required by any standard
bit mutation-only EA [38], [41]. This bound further reduces
to 1.19n ln n ± O(n log log n) if the optimal mutation rate is
used [i.e., (1+√5)/2 ·1/n ≈ 1.618/n]. However, the analysis
requires that diversity is artificially enforced in the population
by breaking ties always preferring genotypically different indi-
viduals. This mechanism ensures that once diversity is created
on a given fitness level, it will never be lost unless a better fit-
ness level is reached, giving ample opportunities for crossover
to exploit this diversity.
Recently, it has been shown that it is not necessary to
enforce diversity for standard steady state GAs to outper-
form standard bit mutation-only EAs [5]. In particular, the
JUMP function was used as an example to show how the
interplay between crossover and mutation may be sufficient
for the emergence of the necessary diversity to escape from
local optima more quickly. Essentially, a runtime of O(nk−1)
may be achieved for any sublinear jump length k > 2 versus
the (nk) required by standard bit mutation-only EAs.
In this paper, we show that this interplay between mutation
and crossover may also speed-up the hillclimbing capabil-
ities of steady state GAs without the need of enforcing
diversity artificially. In particular, we consider a standard
steady state (μ + 1)GA [16], [34], [35] and prove an upper
bound on the runtime to hillclimb the ONEMAX function of
(3/4)en log n+O(n) for any μ ≥ 3 and μ = o(log n/ log log n)
when the standard 1/n mutation rate is used. Apart from show-
ing that standard (μ + 1) GAs are faster than their standard
bit mutation-only counterparts up to population sizes μ =
o(log n/ log log n), our Markov chain (MC) framework analy-
sis provides two other interesting insights. First, it delivers bet-
ter runtime bounds for mutation rates that are higher than the
standard 1/n rate. The best upper bound of 0.72en log n+O(n)
is achieved for c/n with c = (1/2)(√13 − 1) ≈ 1.3. Second,
the framework provides a larger upper bound, up to lower
order terms, for the (2 + 1) GA compared to that of any μ ≥ 3
and μ = o(log n/ log log n). The reason for the larger constant
in the leading term of the runtime is that, for populations of
size 2, there is always a constant probability that any selected
individual takes over the population in the next generation.
This is not the case for population sizes larger than 2.
To shed light on the exact runtime for population size μ = 2
we present a lower bound analysis for a greedy GA, which we
call (2+1)S GA, that always selects individuals of highest fit-
ness for crossover and always successfully recombines them
if their Hamming distance is greater than 2. This algorithm
is similar to the one analyzed by Sudholt [39] to allow the
derivation of a lower bound, with the exception that we do
not enforce any diversity artificially and that our crossover
operator is slightly less greedy (i.e., in [39] crossover always
recombines correctly individuals also when the Hamming dis-
tance is exactly 2). Our analysis delivers a matching lower
bound for all mutation rates c/n, where c is a constant, for
the greedy (2 + 1)S GA [thus also (3/4)en log n + O(n) and
0.72en log n + O(n), respectively, for mutation rates 1/n and
1.3/n]. This result rigorously proves that, under greedy selec-
tion and semi-greedy crossover, the (2 + 1) GA cannot outper-
form any (μ + 1)GA with μ ≥ 3 and μ = o(log n/ log log n).
We present some experimental investigations to shed light
on the questions that emerge from the theoretical work. In
the experiments we consider the commonly used parent selec-
tion that chooses uniformly at random from the population
with replacement (i.e., our theoretical upper bounds hold for
a larger variety of parent selection operators). We first com-
pare the performance of the standard steady state GAs against
the fastest standard bit mutation-only EA with fixed mutation
rate (i.e., the (1+1) EA [38], [41]) and the GAs that have
been proved to outperform it. The experiments show that the
speedups over the (1+1) EA occur already for small problem
sizes n and that population sizes larger than 2 are faster than
the standard (2 + 1) GA. Furthermore, the greedy (2+1)S GA
indeed appears to be faster than the standard (2+1) GA,1 fur-
ther suggesting that the theoretical lower bound also holds for
the latter algorithm. Finally, experiments confirm that larger
mutation rates than 1/n are more efficient. In particular, better
runtimes are achieved for mutation rates that are even larger
than the ones that minimize our theoretical upper bound (i.e.,
c/n with 1.5 ≤ c ≤ 1.6 versus the c =1.3 we have derived
mathematically; interestingly this experimental rate is similar
to the optimal mutation rate for OneMax of the algorithm ana-
lyzed in [39]). These theoretical and experimental results seem
to be in line with those recently presented for the same steady
state GAs for the JUMP function [5], [6]: higher mutation rates
than 1/n are also more effective on JUMP.
The rest of this paper is structured as follows. In the next
section we briefly review previous related works that consider
algorithms using crossover operators. In Section III we give
precise definitions of the steady state (μ + 1)GA and of the
ONEMAX function. In Section IV we present the MC frame-
work that we will use for the analysis of steady state elitist
GAs. In Section V we apply the framework to analyse the
(μ + 1)GA and present the upper bound on the runtime for
any 3 ≤ μ = o(log n/ log log n) and mutation rate c/n for any
constant c. In Section VI we present the matching lower bound
on the runtime of the greedy (2 + 1)S GA. In Section VII we
present our experimental findings. In the conclusion we present
a discussion and open questions for future work. Separate
supplementary files contain an Appendix of omitted proofs
due to space constraints and a complete version of this paper
including all the proofs.
II. RELATED WORK
The first rigorous groundbreaking proof that crossover can
considerably improve the performance of EAs was given by
1We thank an anonymous reviewer for pointing out that this is not obvious.
722 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
Jansen and Wegener [22] for the (μ+ 1)GA with an unrealis-
tically low crossover probability. A series of following works
on the analysis of the JUMP function have made the algorithm
characteristics increasingly realistic [6], [23]. Today it has been
rigorously proved that the standard steady state (μ+1)GA with
realistic parameter settings does not require artificial diver-
sity enforcement to outperform its standard bit mutation-only
counterpart to escape the plateau of local optima of the JUMP
function [5].
Proofs that crossover may make a difference between poly-
nomial and exponential time for escaping local optima have
also been available for some time [20], [36]. The authors
devised example functions, where, if sufficient diversity was
enforced by some mechanism, then crossover could effi-
ciently combine different individuals into an optimal solution.
Mutation, on the other hand required a long time because
of the great Hamming distance between the local and global
optima. The authors chose to call the artificially designed
functions Real Royal Road functions because the Royal Road
functions devised to support the building block hypothesis had
failed to do so [28]. The Real Royal Road functions, though,
had no resemblance with the schemata structures required by
the building block hypothesis.
The utility of crossover has also been proved for less artifi-
cial problems, such as coloring problems inspired by the Ising
model from physics [37], computing input–output sequences
in finite state machines [25], shortest path problems [13], ver-
tex cover [29], and multiobjective optimization problems [32].
The above works show that crossover allows to escape from
local optima that have large basins of attraction for the muta-
tion operator. Hence, they establish the usefulness of crossover
as an operator to enchance the exploration capabilities of the
algorithm.
The interplay between crossover and mutation may produce
a speed-up also in the exploitation phase, for instance when
the algorithm is hillclimbing. Research in this direction has
recently appeared. The design of the (1 + (λ, λ)) GA was
theoretically driven to beat the (n ln n) lower bound of all
unary unbiased black box algorithms. Since the dynamics of
the algorithm differ considerably from those of standard GAs,
it is difficult to achieve more general conclusions about the
performance of GAs from the analysis of the (1+ (λ, λ)) GA.
From this point of view the work of Sudholt [39] is more
revealing when he shows that any standard (μ + λ) GA
outperforms its standard bit mutation-only counterpart for hill-
climbing the ONEMAX function. The only caveat is that the
selection stage enforces diversity artificially, similarly to how
Jansen and Wegener had enforced diversity for the Real Royal
Road function analysis. In this paper, we rigorously prove
that it is not necessary to enforce diversity artificially for
standard-steady state GAs to outperform their standard bit
mutation-only counterpart.
III. PRELIMINARIES
We will analyse the runtime (i.e., the expected number of
fitness function evaluations before an optimal search point
is found) of a steady state GA with population size μ and
Algorithm 1: (μ+1) GA [5], [16], [34], [35]
1 P ← μ individuals, uniformly at random from {0, 1}n;
2 repeat
3 Select x, y ∈ P with replacement using an operator
abiding (1);
4 z ← Uniform crossover with probability 1/2 (x, y);
5 Flip each bit in z with probability c/n;
6 P ← P ∪ {z};
7 Choose one element from P with lowest fitness and
remove it from P, breaking ties at random;
8 until termination condition satisfied;
offspring size 1 (Algorithm 1). In steady state GAs the entire
population is not changed at once, but rather a part of it. In this
paper we consider the most common option of creating one
new solution per generation [34], [35]. Rather than restricting
the algorithm to the most commonly used uniform selection of
two parents, we allow more flexibility to the choice of which
parent selection mechanism is used. This approach was also
followed by Sudholt [39] for the analysis of the (μ + 1)GA
with diversity. In each generation the algorithm picks two par-
ents from its population with replacement using a selection
operator that satisfies the following condition:
∀x, y : f (x) ≥ f (y) =⇒ Pr(select x) ≥ Pr(select y). (1)
The condition allows to use most of the popular parent
selection mechanisms with replacement, such as fitness pro-
portional selection, rank selection, or the one commonly used
in steady state GAs, i.e., uniform selection [16]. Afterward,
uniform crossover between the selected parents (i.e., each bit
of the offspring is chosen from each parent with probabil-
ity 1/2) provides an offspring to which standard bit mutation
(i.e., each bit is flipped with probability c/n) is applied. The
best μ among the μ + 1 solutions are carried over to the next
generation and ties are broken uniformly at random.
In this paper we use the standard convention for nam-
ing steady state algorithms: the (μ + 1) EA differs from the
(μ+1)GA by only selecting one individual per generation for
reproduction and applying standard bit mutation to it (i.e., no
crossover). Otherwise the two algorithms are identical.
We will analyse Algorithm 1 for the well-studied ONEMAX
function that is defined on bitstrings x ∈ {0, 1}n of length n
and returns the number of 1-bits in the string: ONEMAX(x) =∑n
i=1 xi. Here, xi is the ith bit of the solution x ∈ {0, 1}n.
The ONEMAX benchmark function is very useful to assess
the hillclimbing capabilities of a search heuristic. It displays
the characteristic function optimization property that find-
ing improving solutions becomes harder as the algorithm
approaches the optimum. The problem is the same as that
of identifying the hidden solution of the Mastermind game,
where we assume for simplicity that the target string is the
one of all 1-bits. Any other target string z ∈ {0, 1}n may also
be used without loss of generality. If a bitstring is used, then
ONEMAX is equivalent to Mastermind with two colors [15].
This can be generalized to many colors if alphabets of greater
size are used [10], [11].
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 723
Fig. 1. MC for fitness level i.
IV. MARKOV CHAIN FRAMEWORK
The recent analysis of the (μ + 1)GA for the JUMP func-
tion shows that the interplay between crossover and mutation
may create the diversity required for crossover to decrease the
expected time to jump toward the optimum [5]. At the heart
of the proof is the analysis of a random walk on the number
of diverse individuals on the local optima of the function. The
analysis delivers improved asymptotic expected runtimes of
the (μ + 1)GA over mutation-only EAs only for population
sizes μ = ω(1). This happens because, for larger population
sizes, it takes more time to lose diversity once created, hence
crossover has more time to exploit it. For ONEMAX the tech-
nique delivers worse asymptotic bounds for population sizes
μ = ω(1) and an O(n ln n) bound for constant population
size. Hence, the techniques of [5] cannot be directly applied
to show a speed-up of the (μ+1)GA over mutation-only EAs
and a careful analysis of the leading constant in the runtime
is necessary. In this section we present the MC framework
that we will use to obtain the upper bounds on the runtime
of the elitist steady state GAs. We will afterward discuss how
this approach builds upon and generalizes Sudholt’s approach
in [39].
The ONEMAX function has n+1 distinct fitness values. We
divide the search space into the following canonical fitness
levels [21], [31]:
Li =
{
x ∈ {0, 1}n|ONEMAX(x) = i}.
We say that a population is in fitness level i if and only if its
best solution is in level Li.
We use an MC for each fitness level i to represent the dif-
ferent states the population may be in before reaching the next
fitness level. The MC depicted in Fig. 1 distinguishes between
states, where the population has no diversity (i.e., all individ-
uals have the same genotype), hence crossover is ineffective,
and states, where diversity is available to be exploited by the
crossover operator. The MC has one absorbing state and two
transient states. The first transient state S1,i is adopted if the
whole population consists of copies of the same individual at
level i (i.e., all the individuals have the same genotype). The
second state S2,i is reached if the population consists of μ
individuals in fitness level i and at least two individuals x and
y are not identical. The second transient state S2,i differs from
the state S1,i in having diversity which can be exploited by
the crossover operator. S1,i and S2,i are mutually accessible
from each other since the diversity can be introduced at state
S1,i via mutation with some probability pd and can be lost at
state S2,i with some relapse probability pr when copies of a
solution take over the population.
The absorbing state S3,i is reached when a solution at a bet-
ter fitness level is found, an event that happens with probability
pm when the population is at state S1,i and with probability pc
when the population is at state S2,i. We pessimistically assume
that in S2,i there is always only one single individual with a
different genotype (i.e., with more than one distinct individual,
pc would be higher and pr would be zero). Formally when S3,i
is reached the population is no longer in level i because a bet-
ter fitness level has been found. However, we will bound the
expected time to reach the absorbing state for the next level
only when the whole population has reached it (or a higher
level). We do this because we assume that initially all the
population is in level i when calculating the transition proba-
bilities in the MC for each level i. This implies that bounding
the expected times to reach the absorbing states of each fitness
level is not sufficient to achieve an upper bound on the total
expected runtime. When S3,i is reached for the first time, the
population only has one individual at the next fitness level or
in a higher one. Only when all the individuals have reached
level i + 1 (i.e., either in state S1,i+1 or S2,i+1) may we use
the MC to bound the runtime to overcome level i + 1. Then
the MC can be applied, once per fitness level, to bound the
total runtime until the optimum is reached.
The main distinguishing aspect between the analysis
presented herein and that of Sudholt [39] is that we take into
account the possibility to transition back and forth (i.e., resp.
with probability pd and pr) between states S1,i and S2,i as
in standard steady state GAs (see Fig. 1). By enforcing that
different genotypes on the same fitness level are kept in the
population, the GA considered in [39] has a good probability
of exploiting this diversity to recombine the different individ-
uals. In particular, once the diversity is created it will never be
lost, giving many opportunities for crossover to take advantage
of it. A crucial aspect is that the probability of increasing the
number of ones via crossover is much higher than the prob-
ability of doing so via mutation once many 1-bits have been
collected. Hence, by enforcing that once state S2,i is reached
it cannot be left until a higher fitness level is found, Sudholt
could prove that the resulting algorithm is faster compared to
only using standard bit mutation. In the standard steady state
GA, instead, once the diversity is created it may subsequently
be lost before crossover successfully recombines the diverse
individuals. This behavior is modeled in the MC by consider-
ing the relapse probability pr. Hence, the algorithm spends less
time in state S2,i compared to the GA with diversity enforce-
ment. Nevertheless, it will still spend some optimization time
in state S2,i, where it will have a higher probability of improv-
ing its fitness by exploiting the diversity via crossover than
when in state S1,i (i.e., no diversity), where it has to rely on
mutation only. For this reason the algorithm will not be as fast
for ONEMAX as the GA with enforced diversity but will still
be faster than standard bit mutation-only EAs.
An interesting consequence of the possibility of losing
diversity, is that populations of size greater than 2 can be
beneficial. In particular the diversity (i.e., state S2,i) may be
completely lost in the next step when there is only one diverse
individual left in the population. When this is the case, the
relapse probability pr decreases with the population size μ
724 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
because the probability of selecting the diverse individual for
removal is 1/μ. Furthermore, for population size μ = 2 there
is a positive probability that diversity is lost in every gener-
ation by either of the two individuals taking over, while for
larger population sizes this is not the case. As a result our
MC framework analysis will deliver a better upper bound for
μ > 2 compared to the bound for μ = 2. This interesting
insight into the utility of larger populations could not be seen
in the analysis of [39] because there, once the diversity is
achieved, it cannot be lost.
We first concentrate on the expected absorbing time of the
MC. Afterward we will calculate the takeover time before we
can transition from one MC to the next. Since it is not easy
to derive the exact transition probabilities, a runtime analysis
is considerably simplified by using bounds on these probabil-
ities. The main result of this section is stated in the following
theorem that shows that we can use lower bounds on the tran-
sition probabilities moving in the direction of the absorbing
state (i.e., pm, pd, and pc) and an upper bound on the proba-
bility of moving in the opposite direction to no diversity (i.e.,
pr) to derive an upper bound on the expected absorbing time
of the MC. In particular, we define an MC M′ that uses the
bounds on the exact transition probabilities and show that its
expected absorbing time is greater than the absorbing time of
the original chain. Hereafter, we drop the level index i for
brevity and use E[T1] and E[T2] instead of E[T1,i] and E[T2,i]
(similarly, S1 will denote state S1,i).
Theorem 1: Consider two MCs M and M′ with the topology
in Fig. 1, where the transition probabilities for M are pc, pm,
pd, and pr, and the transition probabilities for M′ are p′c, p′m,
p′d, and p′r. Let the expected absorbing time for M be E[T]
and the expected absorbing time of M′ starting from state S1
be E[T ′1], respectively. If
•pm < pc • p′d ≤ pd • p′r ≥ pr
•p′c ≤ pc • p′m ≤ pm.
Then
E[T] ≤ E[T ′1
] ≤ p
′
c + p′r
p′cp′d + p′cp′m + p′mp′r
+ 1
p′c
.
We first concentrate on the second inequality in the state-
ment of the theorem which will follow immediately from the
next lemma. It allows us to obtain the expected absorbing time
of the MC if the exact values for the transition probabilities
are known. In particular, the lemma establishes the expected
times E[T1] and E[T2] to reach the absorbing state, starting
from the states S1 and S2, respectively.
Lemma 1: The expected times E[T1] and E[T2] to reach the
absorbing state, starting from state S1 and S2, respectively, are
as follows:
E[T1] = pc + pr + pdpcpd + pcpm + pmpr ≤
pc + pr
pcpd + pcpm + pmpr +
1
pc
E[T2] = pm + pr + pdpcpd + pcpm + pmpr .
Before we prove the first inequality in the statement of
Theorem 1, we will derive some helper propositions. We first
show that as long as the transition probability of reaching the
absorbing state from the state S2 (with diversity) is greater
than that of reaching the absorbing state from the state with no
diversity S1 (i.e., pm < pc), then the expected absorbing time
from state S1 is at least as large as the expected time uncon-
ditional of the starting point. This will allow us to achieve
a correct upper bound on the runtime by just bounding the
absorbing time from state S1. In particular, it allows us to pes-
simistically assume that the algorithm starts each new fitness
level in state S1 (i.e., there is no diversity in the population).
Proposition 1: Consider an MC with the topology given in
Fig. 1. Let E[T1] and E[T2] be the expected absorbing times
starting from state S1 and S2, respectively. If pm < pc, then
E[T1] > E[T2] and E[T], the unconditional expected absorbing
time, satisfies E[T] ≤ E[T1].
In the following proposition we show that if we overesti-
mate the probability of losing diversity and underestimate the
probability of increasing it, then we achieve an upper bound
on the expected absorbing time as long as pm < pc. Afterward,
in Proposition 3 we show that an upper bound on the absorb-
ing time is also achieved if the probabilities pc and pm are
underestimated.
Proposition 2: Consider two MCs M and M′ with the topol-
ogy in Fig. 1, where the transition probabilities for M are pc,
pm, pd, and pr, and the transition probabilities for M′ are pc,
pm, p′d, and p′r. Let the expected absorbing times starting from
state S1 for M and M′ be E[T1] and E[T ′1], respectively. If
p′d ≤ pd, p′r ≥ pr, and pm < pc, then E[T1] ≤ E[T ′1].
Proposition 3: Consider two MCs M and M′ with the topol-
ogy in Fig. 1, where the transition probabilities for M are pc,
pm, pd, and pr, and the transition probabilities for M′ are p′c,
p′m, pd, and pr. Let the expected absorbing times starting from
state S1 for M and M′ be E[T1] and E[T ′1], respectively. If
p′c ≤ pc and p′m ≤ pm, then E[T1] ≤ E[T ′1].
The propositions use that by lower bounding pd and upper
bounding pr we overestimate the expected number of gen-
erations the population is in state S1 compared to the time
spent in state S2. Hence, if pc > pm we can safely use a lower
bound for pd and an upper bound for pr and still obtain a valid
upper bound on the runtime E[T1]. This is rigorously shown
by combining together the results of the previous propositions
to prove the main result i.e., Theorem 1.
Proof of Theorem 1: Consider a third MC M∗ whose transi-
tion probabilities are pc, pm, p′r, p′d. Let the absorbing time of
M starting from state S1 be E[T1]. In order to prove the above
statement we will prove the following sequence of inequalities:
E[T] ≤ E[T1] ≤ E[T∗1 ] ≤ E[T ′1].
According to Proposition 1, E[T] ≤ E[T1] since pc > pm.
According to Proposition 2 E[T1] ≤ E[T∗1 ] since p′d ≤ pd,
p′r ≥ pr, and pc > pm. Finally, according to Proposition 3,
p′c ≤ pc and p′m ≤ pm implies E[T∗1 ] ≤ E[T ′1] and our proof is
completed by using Lemma 1 to show that the last inequality
of the statement holds.
The algorithm may skip some levels or a new fitness level
may be found before the whole population has reached the
current fitness level. Hence, by summing up the expected run-
times to leave each of the n + 1 levels and the expected times
for the whole population to takeover each level, we obtain an
upper bound on the expected runtime. The next lemma estab-
lishes an upper bound on the expected time it takes to move
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 725
from the absorbing state of the previous MC (S3,i) to any tran-
sient state (S1,i+1 or S2,i+1) of the next MC. The lemma uses
standard takeover arguments originally introduced in the first
analysis of the (μ + 1) EA for ONEMAX [40]. To achieve a
tight upper bound Witt had to carefully wait for only a frac-
tion of the population to take over a level before the next
level was discovered. In our case, the calculation of the tran-
sition probabilities of the MC is actually simplified if we wait
for the whole population to take over each level. Hence, in
our analysis the takeover time calculations are more similar to
the first analysis of the (μ+ 1) EA with and without diversity
mechanisms to takeover the local optimum of TWOMAX [17].
Lemma 2: Let the best individual of the current population
be in level i and all individuals be in level at least i−1. Then,
the expected time for the whole population to be in level at
least i is O(μ log μ).
The lemma shows that, once a new fitness level is discov-
ered for the first time, it takes at most O(μ log μ) generations
until the whole population consists of individuals from the
newly discovered fitness level or higher. While the absorption
time of the MC might decrease with the population size, for
too large population sizes, the upper bound on the expected
total take over time will dominate the runtime. As a result the
MC framework will deliver larger upper bounds on the run-
time unless the expected time until the population takes over
the fitness levels is asymptotically smaller than the expected
absorption time of all MCs. For this reason, our results will
require population sizes of μ = o(log n/ log log n), to allow
all fitness levels to be taken over in expected o(n log n) time
such that the latter time does not affect the leading constant
of the total expected runtime.
V. UPPER BOUND
In this section we use the MC framework devised in
Section IV to prove that the (μ+1) GA is faster than any
standard bit mutation-only (μ + λ) EA.
In order to satisfy the requirements of Theorem 1, we first
show in Lemma 3 that pc > pm if the population is at one of
the final n/(4c(1 + ec)) fitness levels. The lemma also shows
that it is easy for the algorithm to reach such a fitness level.
Afterward we bound the transition probabilities of the MC in
Lemma 4. We conclude the section by stating and proving
the main result, essentially by applying Theorem 1 with the
transition probabilities calculated in Lemma 4.
Lemma 3: For the (μ+1) GA with mutation rate c/n for
any constant c, if the population is in any fitness level i >
n − n/(4c(1 + ec)), then pc is always larger than pm. The
expected time for the (μ+1) GA to sample a solution in fitness
level n − n/(4c(1 + ec)) for the first time is O(nμ log μ).
Proof: We consider the probability pc. If two individuals
on the same fitness level with nonzero Hamming distance 2d
are selected as parents with probability p′, then the probability
that the crossover operator yields an improved solution is at
least (see proof of [39, Th. 4])
Pr(X > d) = 1
2
(1 − Pr(X = d))
= 1
2
(
1 − 2−2d
(
2d
d
))
≥ 1/4 (2)
where X is a binomial random variable with parameters 2d
and 1/2 which represents the number of bit positions, where
the parents differ and which are set to 1 in the offspring. With
probability (1 − c/n)n no bits are flipped and the absorbing
state is reached. If any individual is selected twice as parent,
then the improvement can only be achieved by mutation (i.e.,
with probability pm) since crossover is ineffective. So pc >
p′(1/4)(1 − c/n)n + (1 − p′)pm, hence if pm < p′(1/4)(1 −
c/n)n + (1 − p′)pm it follows that pm < pc. The condition can
be simplified to pm < (1/4)(1 − c/n)n with simple algebraic
manipulation. For large enough n, (1−c/n)n ≥ 1/(1+ec) and
the condition reduces to pm < 1/(4(1 + ec)).
Since pm < (n − i)c/n is an upper bound on the transition
probability (i.e., at least one of the zero bits has to flip to
increase the ONEMAX value), the condition is satisfied for
i ≥ n − n/(4c(1 + ec)). For any level i ≤ n − n/(4c(1 + ec)),
after the take over of the level occurs in O(μ log μ) expected
time, the probability of improving is at least (1) due to the
linear number of 0-bits that can be flipped. Hence, we can
upper bound the total number of generations necessary to reach
fitness level i = n − n/(4c(1 + ec)) by O(nμ log μ).
The lemma has shown that pc > pm holds after a linear
number of fitness levels have been traversed. Now, we bound
the transition probabilities of the MC.
Lemma 4: Let μ ≥ 3. Then the transition probabilities pd,
pc, pr, and pm are bounded as follows:
pd ≥ μ
(μ + 1)
i(n − i)c2
n2(ec + O(1/n)) , pc ≥
μ − 1
2μ2(ec + O(1/n))
pr ≤ (μ − 1)
(
2μ − 1 + O(1/n))
2ecμ2(μ + 1) , pm ≥
c(n − i)
n(ec + O(1/n)) .
Proof: We first bound the probability pd of transitioning
from the state S1,i to the state S2,i. In order to introduce a
new solution at level i with different genotype, it is sufficient
that the mutation operator simultaneously flips one of the n− i
0-bits and one of the i 1-bits while not flipping any other bit.
We point out that in S1,i, all individuals are identical, hence
crossover is ineffective. Moreover, when the diverse solution
is created, it should stay in the population, which occurs with
probability μ/(μ+ 1) since one of the μ copies of the major-
ity individual should be removed by selection instead of the
offspring. So pd can be lower bounded as follows:
pd ≥ μ
(μ + 1)
ic
n
(n − i)c
n
(
1 − c
n
)n−2
.
Using the inequality (1−1/x)x−1 ≥ 1/e ≥ (1−1/x)x, we now
bound (1 − (c/n))n−2 as follows:
(
1 − c
n
)n−2 ≥
(
1 − c
n
)n−1 ≥
(
1 − c
n
)n
=
((
1 − c
n
)(n/c)−1(
1 − c
n
))c
≥
(
1
e
(
1 − c
n
))c
≥ 1
ec
(
1 − c
2
n
)
where in the last step we used the Bernoulli’s inequality.
We can further absorb the c2/n in an asymptotic O(1/n)
726 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
term as follows:
(
1 − c
n
)n−2 ≥ e−c − O(1/n) = 1
ec + O(1/n) . (3)
The bound for pd is then
pd ≥ μ
(μ + 1)
i(n − i)c2
n2(ec + O(1/n)) .
We now consider pc. To transition from state S2,i to S3,i (i.e.,
pc) it is sufficient that two genotypically different individuals
are selected as parents [i.e., with probability at least 2(μ −
1)/μ2], that crossover provides a better solution [i.e., with
probability at least 1/4 according to (2)] and that mutation
does not flip any bits [i.e., probability (1 − c/n)n ≥ 1/(ec +
O(1/n)) according to (3)]. Therefore, the probability is
pc ≥ 2μ − 1
μ2
1
4
(
1 − c
n
)n ≥ μ − 1
2μ2(ec + O(1/n))
For calculating pr we pessimistically assume that the
Hamming distance between the individuals in the population
is 2 and that there is always only one individual with a differ-
ent genotype. A population in state S2,i which has diversity,
goes back to state S1,i when:
1) a majority individual is selected twice as parent [i.e.,
probability (μ− 1)2/μ2], mutation does not flip any bit
[i.e., probability (1−c/n)n], and the minority individual
is discarded [i.e., probability 1/(μ + 1)];
2) two different individuals are selected as parents,
crossover chooses either from the majority individual
in both bit locations, where they differ (i.e., prob. 1/4)
and mutation does not flip any bit [i.e., probability
(1−c/n)n ≤ 1/ec] or mutation must flip at least one spe-
cific bit [i.e., probability O(1/n)]. Finally, the minority
individual is discarded [i.e., probability 1/(μ + 1)];
3) a minority individual is chosen twice as parent and the
mutation operator flips at least two specific bit positions
[i.e., with probability O(1/n2)] and finally the minority
individual is discarded [i.e., probability 1/(μ + 1)].
Hence, the probability of losing diversity is
pr ≤ 1
μ + 1
[
(μ − 1)2
μ2
(
1 − c
n
)n
+ 2 1
μ
μ − 1
μ
(
1
4
(
1 − c
n
)n + O(1/n)
)
+ O(1/n2)
]
≤ 2(μ − 1)
2 + (μ − 1) + 4ec(μ − 1)O(1/n)
2ecμ2(μ + 1) +
O(1/n2)
μ + 1
= (μ − 1)
[
2(μ − 1) + 1 + 4ecO(1/n)] + 2ecμ2O(1/n2)
2ecμ2(μ + 1)
= (μ − 1)
[
2(μ − 1) + 1 + O(1/n)] + O(1/n2)
2ecμ2(μ + 1)
≤ (μ − 1)(2μ − 1 + O(1/n))
2ecμ2(μ + 1) .
In the last inequality we absorbed the O(1/n2) term into the
O(1/n) term.
The transition probability pm from state S1,i to state S3,i
is the probability of improvement by mutation only, because
crossover is ineffective at state S1,i. The number of 1-bits in
the offspring increases if the mutation operator flips one of
the (n − i) 0-bits [i.e., with probability c(n − i)/n] and does
not flip any other bit [i.e., with probability (1 − c/n)n−1 ≥
(ec +O(1/n))−1 according to (3)]. Therefore, the lower bound
on the probability pm is
pm ≥ c(n − i)
n(ec + O(1/n)) .
We are finally ready to state our main result.
Theorem 2: The expected runtime of the (μ+1) GA with
μ ≥ 3 and mutation rate c/n for any constant c on
ONEMAX is
E[T] ≤ 3e
cn log n
c(3 + c) + O(nμ log μ).
For μ = o(log n/ log log n), the bound reduces to
E[T] ≤ 3
c(3 + c)e
cn log n(1 + o(1)).
Proof: We use Theorem 1 to bound E[Ti], the expected
time until the (μ+1) GA creates an offspring at fitness level
i + 1 or above given that all individuals in its initial popula-
tion are at level i. The bounds on the transition probabilities
established in Lemma 4 will be set as the exact transition
probabilities of another MC, M′, with absorbing time larger
than E[Ti] (by Theorem 1). Since Theorem 1 requires that
pc > pm and Lemma 3 establishes that pc > pm holds for
all fitness levels i > n − n/4c(1 + ec), we will only analyse
E[Ti] for n − 1 ≥ i > n − n/(4c(1 + ec)). Recall that, by
Lemma 3, level n − n/(4c(1 + ec)) is reached in expected
O(nμ log μ) time.
Consider the expected absorbing time E[T ′i ], of the MC M′
with transition probabilities
p′d :=
μ
(μ + 1)
i(n − i)c2
n2(ec + O(1/n)) , p
′
c :=
μ − 1
2μ2(ec + O(1/n))
p′r :=
(μ − 1)(2μ − 1 + O(1/n))
2ecμ2(μ + 1) , p
′
m :=
c(n − i)
n(ec + O(1/n)) .
According to Theorem 1
E[Ti] ≤ E[T ′i,1] ≤
p′c + p′r
p′cp′d + p′cp′m + p′mp′r
+ 1
p′c
. (4)
We simplify the numerator and the denominator of the first
term separately. The numerator is
p′c + p′r =
μ − 1
2μ2(ec + O(1/n)) +
(μ − 1)(2μ − 1 + O(1/n))
2ecμ2(μ + 1)
≤ μ − 1
2μ2ec
(
1 + 2μ − 1 + O(1/n)
μ + 1
)
≤ (μ − 1)
[
3μ + O(1/n)]
2μ2ec(μ + 1) . (5)
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 727
We can also rearrange the denominator D = p′cp′d+p′cp′m+p′mp′r
as follows:
D = p′c(p′d + p′m) + p′mp′r
=
(μ − 1)
(
μi(n−i)c2
(μ+1)n2(ec+O(1/n)) + c(n−i)n(ec+O(1/n))
)
2μ2(ec + O(1/n))
+ c(n − i)(μ − 1)(2μ − 1 + O(1/n))
n(ec + O(1/n))2ecμ2(μ + 1)
≥
(μ − 1)
(
μi(n−i)c2
(μ+1)n2 + c(n−i)n
)
2μ2
(
e2c + O(1/n))
+ c(n − i)(μ − 1)(2μ − 1 + O(1/n))
n
(
e2c + O(1/n))2μ2(μ + 1)
≥ c(n − i)(μ − 1)
2μ2
(
e2c + O(1/n)) ·
(
μic
(μ + 1)n2 +
1
n
+ 2μ − 1 + O(1/n)
n(μ + 1)
)
≥ c(n − i)(μ − 1)
2μ2
(
e2c + O(1/n)) ·
(
μic + (μ + 1)n + n(2μ − 1 + O(1/n))
(μ + 1)n2
)
≥ c(n − i)(μ − 1)
(
μic + n[3μ + O(1/n)])
2μ2
(
e2c + O(1/n))(μ + 1)n2 . (6)
Note that the term in square brackets is the same in both the
numerator [i.e., (5)] and the denominator [i.e., (6)] including
the small order terms in O(1/n) (i.e., they are identical). Let
A = [3μ + c′/n], where c′ > 0 is the smallest constant that
satisfies the O(1/n) in the upper bound on pr in Lemma 4.
We can now put the numerator and denominator together and
simplify the expression
(p′c + p′r)/
(
p′c(p′d + p′m) + p′mp′r
)
≤ (μ − 1)A
2μ2ec(μ + 1) ·
2μ2
(
e2c + O(1/n))(μ + 1)n2
c(n − i)(μ − 1)(μic + nA)
≤ A
(
e2c + O(1/n))n2
ecc(n − i)(μic + nA) .
By using that [(e2c + O(1/n))/(ec)] ≤ ec + O(1/n), we get
≤ A(e
c + O(1/n))n2
c(n − i)(μic + nA)
≤ ec An
2
c(n − i)(μic + nA) + O(1/n)
An2
c(n − i)(μic + nA) .
The facts, n − i ≥ 1, A = (1), and μ, i, c > 0 imply
that, nA+μic = (n) and (An2/c(n − i)(μic + nA)) = O(n).
When multiplied by the O(1/n) term, we get
≤ e
cn
c(n − i)
An
(μic + nA) + O(1).
By adding and subtracting μic to the numerator of
(An/((μic + nA)]), we obtain
≤ e
cn
c(n − i)
(
1 − μic
μic + nA
)
+ O(1).
Note that the multiplier outside the brackets, (ecn)/(c(n−i)),
is in the order of O(n/(n−i)). We now add and subtract μnc to
the numerator of −(μic/μic + nA) to create a positive additive
term in the order of O(μ(n − i)/n)
= e
cn
c(n − i)
(
1 − μnc
μic + nA +
μ(n − i)c
μic + nA
)
+ O(1)
= e
cn
c(n − i)
(
1 − μnc
μic + nA
)
+ e
cn
c(n − i)
μ(n − i)c
μic + nA
+ O(1) = e
cn
c(n − i)
(
1 − μnc
μic + nA
)
+ O(μ).
Since p′c = (1/μ), we can similarly absorb 1/p′c into the
O(μ) term. After the addition of the remaining term 1/p′c
from (4), we obtain a valid upper bound on E[Ti]
E[Ti] ≤ p
′
c + p′r
p′cp′d + p′cp′m + p′mp′r
+ 1
p′c
≤ e
cn
c(n − i)
(
1 − μnc
μic + nA
)
+ O(μ).
In order to bound the negative term, we will rearrange its
denominator (i.e., nA + μic)
n
[
3μ + c′/n] + μ = 3μn + c′ + μic
= 3μn + c′ − (n − i)μc + μnc
< μn(3 + c) + c′
where the second equality is obtained by adding and subtract-
ing μnc. Altogether
E[Ti] ≤ e
cn
c(n − i)
(
1 − μnc
μn(3 + c) + c′
)
+ O(μ)
= e
cn
c(n − i)
(
1 − μnc + c
′ c
3+c − c′ c3+c
μn(3 + c) + c′
)
+ O(μ)
= e
cn
c(n − i)
(
1 − c
3 + c +
c′ c3+c
μn(3 + c) + c′
)
+ O(μ)
= e
cn
c(n − i)
(
1 − c
3 + c + O(1/n)
)
+ O(μ)
= e
cn
c(n − i)
3
3 + c + O(μ).
If we add the expected time to take over each fitness level
from Lemma 2 and sum over all fitness levels the upper bound
on the runtime is
n∑
i=n−n/(4c(1+ec))
(
ecn
c(n − i)
3
3 + c + O(μ) + O(μ log μ)
)
≤
n∑
i=0
(
ecn
c(n − i)
3
3 + c + O(μ log μ)
)
≤ 3e
cn log n
c(3 + c) + O(nμ log μ) ≤
3ecn log n
c(3 + c) (1 + o(1))
where in the last inequality we use μ = o(log n/ log log n) to
prove the second statement of the theorem.
The second statement of the theorem provides an upper
bound of (3/4)en log n for the standard mutation rate 1/n (i.e.,
c = 1) and μ = o(log n/ log log n). The upper bound is mini-
mized for c = (1/2)(√13 − 1). Hence, the best upper bound
728 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
is delivered for a mutation rate of about 1.3/n. The resulting
leading term of the upper bound is
E[T] ≤ 6e
1
2
(√
13−1
)
n log n
(√
13 − 1
)(
1
2
(√
13 − 1
)
+ 3
) ≈ 1.97n log n.
We point out that Theorem 2 holds for any μ ≥ 3. Our
framework provides a higher upper bound when μ = 2 com-
pared to larger values of μ. The main difference lies in the
probability pr as shown in the following lemma.
Lemma 5: The transition probabilities pm, pr, pc, and pd
for the (2+1) GA, with mutation rate c/n and c constant, are
bounded as follows:
pd ≥ 23
i(n − i)c2
n2(ec + O(1/n)) , pc ≥
1
8(ec + O(1/n))
pr ≤ 524ec + O(1/n), pm ≥
c(n − i)
(ec + O(1/n))n .
The upper bound on pr from Lemma 4 is 1/(8ec), which is
smaller than the bound we have just found. This is due to the
assumptions in the lemma that there can be only one genotype
in the population at a given time which can take over the
population in the next iteration. However, when μ = 2, either
individual can take over the population in the next iteration.
This larger upper bound on pr for μ = 2 leads to a larger upper
bound on the runtime of E[T] ≤ [4/(c + 4)](ecn log n/c)(1 +
o(1)) for the (2+1) GA. The calculations are omitted as they
are the same as those of the proof of Theorem 2, where pr ≥
5/(24ec) + O(1/n) is used and μ is set to 2.
VI. LOWER BOUND
In the previous section we provided a higher upper bound
for the (2+1) GA compared to the (μ+1) GA with population
size greater than 2 and μ = o(log n/ log log n). To rigorously
prove that the (2+1) GA is indeed slower, we require a lower
bound on the runtime of the algorithm that is higher than the
upper bound provided in the previous section for the (μ + 1)
GA (μ ≥ 3).
Since providing lower bounds on the runtime is a noto-
riously hard task, we will follow a strategy previously used
by Sudholt [39] and analyse a version of the (μ+1) GA
with greedy parent selection and greedy crossover (i.e.,
Algorithm 2) in the sense that:
1) parents are selected uniformly at random only among
the solutions from the highest fitness level (greedy
selection);
2) if the offspring has the same fitness as its parents and its
Hamming distance to any individual with equal fitness
in the population is larger than 2, then the algorithm
automatically performs an OR operation between the
offspring and the individual with the largest Hamming
distance and fitness, breaking ties arbitrarily, and adds
the resulting offspring to the population, i.e., we pes-
simistically allow it to skip as many fitness levels as
possible (semi-greedy crossover).
The greedy selection allows us to ignore the improvements
that occur via crossover between solutions from different fit-
ness levels. Thus, the crossover is only beneficial when there
Algorithm 2: (2 + 1)S GA
1 P ← μ individuals, uniformly at random from {0, 1}n;
2 repeat
3 Choose x, y ∈ P uniformly at random among P∗, the
individuals with the current best fitness f ∗;
4 z ← Uniform crossover with probability 1/2 (x, y);
5 Flip each bit in z with probability c/n;
6 If f (z) = f ∗ and max
w∈P∗(HD(w, z)) > 2 then
z ← z ∨ arg max
w∈P∗
(HD(w, z)) ;
7 P ← P ∪ {z};
8 Choose one element from P with lowest fitness and
remove it from P, breaking ties at random;
9 until termination condition satisfied;
are at least two different genotypes in the population at the
highest fitness level discovered so far. The difference with the
algorithm analyzed by Sudholt [39] is that the (2 + 1)S GA
we consider does not use any diversity mechanism and it does
not automatically crossover correctly when the Hamming dis-
tance between parents is exactly 2. As a result, there still is
a nonzero probability of losing diversity before a successful
crossover occurs. The crossover operator of the (2+1)S GA is
less greedy than the one analyzed in [39] (i.e., there crossover
is automatically successful also when the Hamming distance
between the parents is 2). We point out that the upper bounds
on the runtime derived in the previous section also hold for
the greedy (2 + 1)S GA.
The MC structure of Fig. 1 is still representative of the
states that the algorithm can be in. When there is no diversity
in the population, either an improvement via mutation occurs
or diversity is introduced into the population by the muta-
tion operator. When diversity is present, both crossover and
mutation can reach a higher fitness level while there is also
a probability that the population will lose diversity by repli-
cating one of the existing genotypes. With a population size
of two the diversity can be lost by creating a copy of either
solution and removing the other one from the population dur-
ing environmental selection (i.e., line 8 in Algorithm 2). With
population sizes greater than two, the loss of diversity can only
occur when the majority genotype (i.e., the genotype with most
copies in the population) is replicated. Building upon this we
will show that the asymptotic performance of the (2+1)S GA
for ONEMAX cannot be better than that of the (μ + 1) GAs
for μ > 2.
Like in [39] for our analysis we will apply the fitness level
method for lower bounds proposed by Sudholt [38]. Due to
the greedy crossover and the greedy parent selection used
in [39], the population could be represented by the trajec-
tory of a single individual. If an offspring with lower fitness
was added to the population, then the greedy parent selection
never chose it. If instead, a solution with equally high fitness
and different genotype was created, then the algorithm imme-
diately reduced the population to a single individual that is the
best possible outcome from crossing over the two genotypes.
The main difference between the following analysis and that
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 729
of [39] is that we want to take into account the possibility that
the gained diversity may be lost before crossover exploits it.
To this end, when individuals of equal fitness and Hamming
distance 2 are created, crossover only exploits this success-
fully (i.e., goes to the next fitness level) with the conditional
probability that crossover is successful before the diversity
is lost. Otherwise, the diversity is lost. Only when individu-
als of Hamming distance larger than 2 are created, we allow
crossover to immediately provide the best possible outcome
as in [39].
Now, we can state the main result of this section.
Theorem 3: The expected runtime of the (2 + 1)S GA with
mutation probability p = c/n for any constant c on ONEMAX
is no less than
3ec
c
(
3 + max
1≤k≤n
(
(np)k
(k!)2
))n log n − O(n log log n).
Note that for c ≤ 4, max
1≤k≤n(((np)
k/(k!)2)) ≤ pn = c. Since
E[T] ≥ en log n for c ≥ 3, for the purpose of finding the
mutation rate that minimizes the lower bound, we can reduce
the statement of the theorem to
3ecn log n
c(3 + c) − O(n log log n).
The theorem provides a lower bound of (3/4)en log n −
O(n log log n) for the standard mutation rate 1/n (i.e., c =
1). The lower bound is minimized for c = (1/2)(√13 − 1).
Hence, the smallest lower bound is delivered for a mutation
rate of about 1.3/n. The resulting lower bound is
E[T] ≥ 6e
1
2
(√
13−1
)
n log n
(√
13 − 1
)(
1
2
(√
13 − 1
)
+ 3
) − O(n log log n)
≈ 1.97n log n − O(n log log n).
Since the lower bound for the (2 + 1)S GA matches the
upper bound for the (μ+1) GA with μ > 2, the theorem proves
that, under greedy selection and semi-greedy crossover, popu-
lations of size 2 cannot be faster than larger population sizes
up to μ = o(log n/ log log n). In the following section we give
experimental evidence that the greedy algorithms are faster
than the standard (2+1) GA, thus suggesting that the same
conclusions hold also for the standard non greedy algorithms.
VII. EXPERIMENTS
The theoretical results presented in the previous sections
pose some new interesting questions. On one hand, the the-
ory suggests that population sizes greater than 2 benefit
the (μ+1) GA for hillclimbing the ONEMAX function. On
the other hand, the best runtime bounds are obtained for a
mutation rate of approximately 1.3/n, suggesting that higher
mutation rates than the standard 1/n rate may improve the
performance of the (μ+1) GA. In this section we present the
outcome of some experimental investigations to shed further
light on these questions. In particular, we will investigate the
effects of the population size and mutation rate on the run-
time of the steady-state GA for ONEMAX and compare its
Fig. 2. Average runtime over 1000 independent runs versus problem size n.
runtime against other GAs that have been proved to be faster
than mutation-only EAs in the literature.
We start with an overview of the performance of the algo-
rithms. In Fig. 2, we plot the average runtime over 1000
independent runs of the (μ+1) GA with μ = 2 and μ = 5
(with uniform parent selection and standard 1/n mutation rate)
for exponentially increasing problem sizes and compare it
against the fastest standard bit mutation-only EA with static
mutation rate [i.e., the (1 + 1) EA with 1/n mutation rate].
While the algorithm using μ = 5 outperforms the μ = 2
version, they are both faster than the (1+1) EA already for
small problem sizes. We also compare the algorithms against
the (2+1) GA investigated by Sudholt [39], where diversity
is enforced by the environmental selection always preferring
distinct individuals of equal fitness—the same GA variant that
was first proposed and analyzed in [22]. We run the algorithm
both with standard mutation rate 1/n and with the optimal
mutation rate (1 + √5)/(2n). Obviously, when diversity is
enforced, the algorithms are faster. Finally, we also compare
the algorithms against the (1 + (λ, λ)) GA with self-adjusting
population sizes and Sudholt’s (2 + 1) GA as they were com-
pared previously in [9]. Note that in [9, Fig. 8] Sudholt’s
algorithm was implemented with a very greedy parent selec-
tion operator that always prefers distinct individuals on the
highest fitness level for reproduction.
In order to decompose the effects of the greedy parent selec-
tion, greedy crossover, and the use of diversity, we conducted
further experiments shown in Fig. 3. Here, we see that it
is indeed the enforced diversity that creates the fundamen-
tal performance difference. Moreover, the results show that
the greedy selection/greedy crossover GA is slightly faster
than the greedy parent selection GA and that greedy parent
selection is slightly faster than standard selection. Overall, the
figure suggests that the lower bound presented in Theorem 3
is also valid for the standard (2+1) GA with uniform par-
ent selection (i.e., no greediness). In Fig. 3, it can be noted
that the performance difference between the GA with greedy
crossover and greedy parent selection analyzed in [39] and
the (2+1) GA with enforced diversity and without greedy
730 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
Fig. 3. Comparison between standard selection, greedy selection, and
greedy selection + greedy crossover GAs. The runtime is averaged over 1000
independent runs.
Fig. 4. Average runtime gain of the (μ+1) GA versus the (2+1) GA for
different population sizes, errorbars show the standard deviation normalized
by the average runtime for μ = 2.
crossover is more pronounced than the performance difference
between the standard (2+1) GA analyzed in Section V and the
(2 + 1)S GA which was analyzed in Section VI. The reason
behind the difference in discrepancies is that the (2 + 1)S GA
does not implement the greedy crossover operator when the
Hamming distance is 2. We speculate that cases, where the
Hamming distance is just enough for the crossover to exploit
it occur much more frequently than the cases, where a larger
Hamming distance is present. As a result, the performance of
the (2 + 1)S GA does not deviate much from the standard
algorithm. Table I in the supplementary material presents the
mean and standard deviation of the runtimes of the algorithms
depicted in Figs. 2 and 3 over 1000 independent runs.
Now, we investigate the effects of the population size on
the (μ+1) GA. We perform 1000 independent runs of the
(μ+1) GA with uniform parent selection and standard muta-
tion rate 1/n for increasing population sizes up to μ = 16.
Fig. 5. Average runtime gain of the (μ+1) GA versus the (1 + 1) EA for
different population sizes, errorbars show the standard deviation normalized
by the average runtime of the (1 + 1) EA.
Fig. 6. Average runtime gain of the (5 + 1) GA for various mutation rates
versus the standard 1/n mutation rate, errorbars show the standard deviation
normalized by the average runtime for 1/n mutation rate.
In Fig. 4 we present average runtimes divided by the runtime
of the (2 + 1) GA and in Fig. 5 normalized against the run-
time of the (1+1) EA. In both figures, we see that the runtime
improves for μ larger than 2 and after reaching its lowest
value increases again with the population size. It is not clear
whether there is a constant optimal static value for μ around
4 or 5. The experiments, however, do not rule out the possi-
bility that the optimal static population size increases slowly
with the problem size (i.e., μ = 3 for n = 256, μ = 4 for
n = 4096, and μ = 5 for n = 16 384). On the other hand, we
clearly see that as the problem size increases we get a larger
improvement on the runtime. This indicates that the harder is
the problem, more useful are the populations. In particular, in
Fig. 5 we see that the theoretical asymptotic gain of 25% with
respect to the runtime of the (1 + 1) EA is approached more
and more closely as n increases. For the considered problem
sizes, the (μ+1) GA is faster than the (1 + 1) EA for all tested
CORUS AND OLIVETO: STANDARD STEADY STATE GAs CAN HILLCLIMB FASTER THAN MUTATION-ONLY EAs 731
values of μ. However, to see the runtime improvement of the
(μ+1) GA against the (2+1) GA for μ > 15 the experiments
(Fig. 4) suggest that greater problem sizes would need to be
used.
Finally, we investigate the effect of the mutation rate on the
runtime. Based on our previous experiments we set the popu-
lation size to the best seen value of μ = 5 and perform 1000
independent runs for each c value ranging from 0.9 to 1.9.
In Fig. 6, we see that even though the mutation rate c ≈ 1.3
minimizes the upper bound we proved on the runtime, setting
a larger mutation rate of 1.6 further decreases the runtime.
VIII. CONCLUSION
The question of whether GAs can hillclimb faster than
mutation-only algorithms is a long standing one. On one hand,
in his pioneering book, Rechenberg had given preliminary
experimental evidence that crossover may speed up the run-
time of population-based EAs for generalized ONEMAX [33].
On the other hand, further experiments suggested that GAs
were slower hillclimbers than the (1+1) EA [20], [28]. In
recent years it has been rigorously shown that crossover and
mutation can outperform algorithms using only mutation. First,
a new theory-driven GA called (1+(λ, λ)) GA has been shown
to be asymptotically faster for hillclimbing the ONEMAX func-
tion than any unbiased mutation-only EA [9]. Second, it has
been shown how standard (μ+λ) GAs are twice as fast as their
standard bit mutation-only counterparts for ONEMAX as long
as diversity is enforced through environmental selection [39].
In this paper, we have rigorously proven that standard
steady-state GAs with μ ≥ 3 and μ = o(log n/ log log n) are
at least 25% faster than all unbiased standard bit mutation-
based EAs with static mutation rate for ONEMAX even if no
diversity is enforced. The MC framework we used to achieve
the upper bounds on the runtimes should be general enough to
allow future analyses of more complicated GAs, for instance
with greater offspring population sizes or more sophisticated
crossover operators. A limitation of the approach is that it
applies to classes of problems that have plateaus of equal fit-
ness. Hence, for functions, where each genotype has a different
fitness value our approach would not apply. An open ques-
tion is whether the limitation is inherent to our framework or
whether it is crossover that would not help steady-state EAs
at all on such fitness landscapes.
Our results also explain that populations are useful not
only in the exploration phase of the optimization, but also to
improve exploitation during the hillclimbing phases. In partic-
ular, larger population sizes increase the probability of creating
and maintaining diversity in the population. This diversity can
then be exploited by the crossover operator. Recent results
had already shown how the interplay between mutation and
crossover may allow the emergence of diversity, which in
turn allows to escape plateaus of local optima more effi-
ciently compared to relying on mutation alone [5]. This paper
sheds further light on the picture by showing that popula-
tions, crossover, and mutation together, not only may escape
optima more efficiently, but may be more effective also in the
exploitation phase.
Another additional insight gained from the analysis is that
the standard mutation rate 1/n may not be optimal for the
(μ+1) GA on ONEMAX. This result is also in line with, and
nicely complements, other recent findings concerning steady
state GAs. For escaping plateaus of local optima it has been
recently shown that increasing the mutation rate above the
standard 1/n rate leads to smaller upper bounds on escaping
times [5]. However, when jumping large low-fitness valleys,
mutation rates of about 2.6/n seem to be optimal static rates
(see the experiment section in [3] and [4]). For ONEMAX
lower mutation rates seem to be optimal static rates, but still
considerably larger than the standard 1/n rate.
New interesting questions for further work have spawned.
Concerning population sizes an open problem is to rigorously
prove whether the optimal size grows with the problem size
and at what rate. Also determining the optimal mutation rate
remains an open problem. While our theoretical analysis deliv-
ers the best upper bound on the runtime with a mutation rate
of about 1.3/n, experiments suggest a larger optimal muta-
tion rate. Interestingly, this experimental rate is very similar to
the optimal mutation rate (i.e., approximately 1.618/n) of the
(μ+ 1) GA with enforced diversity proven in [39]. The bene-
fits of higher than standard mutation rates in elitist algorithms
is a topic that is gaining increasing interest [2], [14], [30].
Further improvements may be achieved by dynamically
adapting the population size and mutation rate during the
run. Advantages, in this sense, have been shown for the
(1 + (λ, λ)) GA by adapting the population size [7] and
for single individual algorithms by adapting the mutation
rate [12], [26]. Generalizing the results to larger classes of hill-
climbing problems is intriguing. In particular, proving whether
speed ups of the (μ+1) GA compared to the (1 + 1) EA are
also achieved for Royal Road functions would give a definitive
answer to a long standing question [28]. Analyses for larger
problem classes, such as linear functions and classical combi-
natorial optimization problems would lead to further insights.
Yet another natural question is how the (μ+1) GA hillclimbing
capabilities compare to (μ + λ) GAs and generational GAs.
REFERENCES
[1] T. Bäck, Evolutionary Algorithms in Theory and Practice. Oxford, U.K.:
Oxford Univ. Press, 1996.
[2] D. Corus, P. S. Oliveto, and D. Yazdani, “On the runtime analysis of the
opt-IA artificial immune system,” in Proc. ACM Genet. Evol. Comput.
Conf. (GECCO), Berlin, Germany, 2017, pp. 83–90.
[3] D.-C. Dang et al., “Escaping local optima using crossover with emergent
or reinforced diversity,” ArXiv:1608.03123, Aug. 2016.
[4] D.-C. Dang et al., “Escaping local optima using crossover with emer-
gent diversity,” IEEE Trans. Evol. Comput., to be published, doi:
10.1109/TEVC.2017.2724201.
[5] D.-C. Dang et al., “Emergence of diversity and its benefits for crossover
in genetic algorithms,” in Proc. Int. Conf. Parallel Problem Solving
Nat. (PPSN XIV), 2016, pp. 890–900.
[6] D.-C. Dang et al., “Escaping local optima with diversity mechanisms
and crossover,” in Proc. Genet. Evol. Comput. Conf. (GECCO), Denver,
CO, USA, 2016, pp. 645–652.
[7] B. Doerr and C. Doerr, “Optimal parameter choices through self-
adjustment: Applying the 1/5-th rule in discrete settings,” in Proc. ACM
Genet. Evol. Comput. Conf. (GECCO), 2015, pp. 1335–1342.
[8] B. Doerr and C. Doerr, “A tight runtime analysis of the (1+(λ, λ))
genetic algorithm on onemax,” in Proc. ACM Genet. Evol. Comput.
Conf. (GECCO), 2015, pp. 1423–1430.
732 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 22, NO. 5, OCTOBER 2018
[9] B. Doerr, C. Doerr, and F. Ebel, “From black-box complexity to design-
ing new genetic algorithms,” Theor. Comput. Sci., vol. 567, pp. 87–104,
Feb. 2015.
[10] B. Doerr, C. Doerr, and T. Köetzing, “The right mutation strength for
multi-valued decision variables,” in Proc. ACM Genet. Evol. Comput.
Conf. (GECCO), Denver, CO, USA, 2016, pp. 1115–1122.
[11] B. Doerr, C. Doerr, R. Spöhel, and H. Thomas, “Playing mastermind
with many colors,” J. ACM, vol. 63, no. 5, pp. 1–23, 2016.
[12] B. Doerr, C. Doerr, and J. Yang, “k-Bit mutation with self-adjusting k
outperforms standard bit mutation,” in Proc. Int. Conf. Parallel Problem
Solving Nat. (PPSN XIV), 2016, pp. 824–834.
[13] B. Doerr, E. Happ, and C. Klein, “Crossover can provably be useful
in evolutionary computation,” Theor. Comput. Sci., vol. 425, pp. 17–33,
Mar. 2012.
[14] B. Doerr, H. P. Le, R. Makhmara, and T. D. Nguyen, “Fast genetic algo-
rithms,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), Berlin,
Germany, 2017, pp. 777–784.
[15] B. Doerr and C. Winzen, “Playing mastermind with constant-size
memory,” Theory Comput. Syst., vol. 55, no. 4, pp. 658–684, 2014.
[16] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing.
Berlin, Germany: Springer-Verlag, 2003.
[17] T. Friedrich, P. S. Oliveto, D. Sudholt, and C. Witt, “Analysis of
diversity-preserving mechanisms for global exploration,” Evol. Comput.,
vol. 17, no. 4, pp. 455–476, 2009.
[18] D. E. Goldberg, Genetic Algorithms in Search, Optimization and
Machine Learning. Reading, MA, USA: Addison-Wesley, 1989.
[19] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor,
MI, USA: Univ. Michigan Press, 1975.
[20] T. Jansen and I. Wegener, “Real royal road functions—Where
crossover provably is essential,” Discr. Appl. Math., vol. 149, nos. 1–3,
pp. 111–125, 2005.
[21] T. Jansen, Analyzing Evolutionary Algorithms: The Computer Science
Perspective. Berlin, Germany: Springer-Verlag, 2013.
[22] T. Jansen and I. Wegener, “The analysis of evolutionary algorithms—
A proof that crossover really can help,” Algorithmica, vol. 34, no. 1,
pp. 47–66, 2002.
[23] T. Kötzing, D. Sudholt, and M. Theile, “How crossover helps in
pseudo-Boolean optimization,” in Proc. ACM Genet. Evol. Comput.
Conf. (GECCO), Dublin, Ireland, 2011, pp. 989–996.
[24] P. K. Lehre and C. Witt, “Black-box search by unbiased variation,”
Algorithmica, vol. 64, no. 4, pp. 623–642, 2012.
[25] P. K. Lehre and X. Yao, “Crossover can be constructive when com-
puting unique input–output sequences,” Soft Comput., vol. 15, no. 9,
pp. 1675–1687, 2011.
[26] A. Lissovoi, P. S. Oliveto, and J. A. Warwicker, “On the runtime analysis
of generalised selection hyper-heuristics for pseudo-Boolean optimi-
sation,” in Proc. ACM Genet. Evol. Comput. Conf. (GECCO), 2017,
pp. 849–856.
[27] M. Mitchell, J. H. Holland, and S. Forrest, “Relative building-block
fitness and the building block hypothesis,” in Proc. Found. Genet.
Algorithms (FOGA), vol. 2. 1993, pp. 109–126.
[28] M. Mitchell, J. Holland, and S. Forrest, “When will a genetic algo-
rithm outperform hill climbing?” in Neural Information Processing
Systems (NIPS 6). San Mateo, CA, USA: Morgan Kaufmann, 1994,
pp. 51–58.
[29] F. Neumann, P. S. Oliveto, G. Rudolph, and D. Sudholt, “On the effec-
tiveness of crossover for migration in parallel evolutionary algorithms,”
in Proc. Genet. Evol. Comput. Conf. (GECCO), 2011, pp. 1587–1594.
[30] P. S. Oliveto, P. K. Lehre, and F. Neumann, “Theoretical analysis
of rank-based mutation—Combining exploration and exploitation,” in
Proc. IEEE Congr. Evol. Comput. (CEC), Trondheim, Norway, 2009,
pp. 1455–1462.
[31] P. S. Oliveto and X. Yao, “Runtime analysis of evolutionary algorithms
for discrete optimization,” in Theory of Randomized Search Heuristics:
Foundations and Recent Developments, B. Doerr and A. Auger, Eds.
Hackensack, NJ, USA: World Sci., 2011, pp. 21–52.
[32] C. Qian, Y. Yu, and Z. Zhou, “An analysis on recombination in
multi-objective evolutionary optimization,” Artif. Intell., vol. 204,
pp. 99–119, Nov. 2013.
[33] I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme
nach Prinzipien der biologishen Evolution. Stuttgart, Germany: Friedrich
Frommann Verlag, 1973.
[34] J. E. Rowe, “Genetic algorithms,” in Handbook of Computational
Intelligence, W. Pedrycz and J. Kacprzyk, Eds. Berlin, Germany:
Springer-Verlag, 2011, pp. 825–844.
[35] J. Sarma and K. D. Jong, “Generation gap methods,” in Handbook of
Evolutionary Computation, T. Back, D. B. Fogel, and Z. Michalewicz,
Eds. Bristol, U.K.: Inst. Phys., 1997, ch. C 2.7.
[36] T. Storch and I. Wegener, “Real royal road functions for constant
population size,” Theor. Comput. Sci., vol. 320, no. 1, pp. 123–134,
2004.
[37] D. Sudholt, “Crossover is provably essential for the Ising model on
trees,” in Proc. Genet. Evol. Comput. Conf. (GECCO), Washington, DC,
USA, 2005, pp. 1161–1167.
[38] D. Sudholt, “A new method for lower bounds on the running time of
evolutionary algorithms,” IEEE Trans. Evol. Comput., vol. 17, no. 3,
pp. 418–435, Jun. 2013.
[39] D. Sudholt, “How crossover speeds up building-block assembly in
genetic algorithms,” Evol. Comput., vol. 25, no. 2, pp. 237–274, 2017.
[40] C. Witt, “Runtime analysis of the (μ+1) EA on simple pseudo-Boolean
functions,” Evol. Comput., vol. 14, no. 1, pp. 65–86, 2006.
[41] C. Witt, “Tight bounds on the optimization time of a randomized search
heuristic on linear functions,” Combinatorics Probab. Comput., vol. 22,
no. 2, pp. 294–318, 2013.
Dogan Corus (M’17) received the B.S. and
M.S. degrees in industrial engineering from Koç
University, Istanbul, Turkey and the Ph.D. degree
in computer science from the University of
Nottingham, Nottingham, U.K.
He is a Research Associate with the University
of Sheffield, Sheffield, U.K. His current research
interests include the runtime analysis of bio-inspired
algorithms, in particular, population-based algo-
rithms and genetic algorithms.
Pietro S. Oliveto (S’07–M’09–SM’16) received
the Laurea degree in computer science from the
University of Catania, Catania, Italy in 2005 and the
Ph.D. degree from the University of Birmingham,
Birmingham, U.K., in 2009.
He is a Senior Lecturer and an EPSRC Early
Career Fellow with the University of Sheffield,
Sheffield, U.K. He has been an EPSRC Ph.D.+
Fellow, from 2009 to 2010, and an EPSRC Post-
Doctoral Fellow, from 2010 to 2013, with the
University of Birmingham and a Vice-Chancellor’s
Fellow with the University of Sheffield, from 2013 to 2016. His current
research interests include the performance analysis of bio-inspired compu-
tation techniques including evolutionary algorithms, genetic programming,
artificial immune systems and hyperheuristics.
Dr. Oliveto was a recipient of the best paper awards at the Genetic and
Evolutionary Computation Conference in 2008 and 2014 and the International
Conference on Artificial Immune Systems in 2011. He is a part of the
Steering Committee of the Annual Workshop on Theory of Randomized
Search Heuristics, an Associate Editor of the IEEE TRANSACTIONS ON
EVOLUTIONARY COMPUTATION, the Chair of the IEEE CIS Task Force on
Theoretical Foundations of Bio-Inspired Computation and a member of the
EPSRC Peer Review College.

欢迎咨询51作业君