### Dynamics when there is a full separation of time scales

We begin by assuming a complete separation of time scales, (nu ll mu _mathrm{out} ll mu _mathrm{in}). In this setting, the intra-group dynamics are fast compared to the others. As a result, at any point in time there are at most two different strategies present in any group. When a mutation or an out-group imitation introduces a new strategy, intra-group imitation leads to the extinction or fixation of this strategy before the next strategy is introduced. In the following, we describe this dynamics in more detail.

#### Description of the intra-group dynamics

Consider a group in which *i* players adopt the strategy (mathbf {p}) and (N!-!i) players adopt the strategy (mathbf {q}). By Eq. (5), the players’ payoffs are given by (see “Methods” for an alternative formulation)

$$begin{aligned} begin{array}{lccc} displaystyle pi _mathbf {p}(i) = &{}displaystyle frac{i-1}{N-1}cdot pi _{mathbf {p},mathbf {p}} &{}+ &{}displaystyle frac{N-i}{N-1}cdot pi _{mathbf {p},mathbf {q}},\ displaystyle pi _mathbf {q}(i) = &{}displaystyle frac{i}{N-1}cdot pi _{mathbf {q},mathbf {p}} &{}+ &{}displaystyle frac{N-i-1}{N-1}cdot pi _{mathbf {q},mathbf {q}}. end{array} end{aligned}$$

(8)

The probability that intra-group imitation increases (decreases) the number of (mathbf {p})-players in a single time step is

$$begin{aligned} T_i^{+}&= frac{N-i}{N}frac{i}{N-1} frac{1}{1+ exp left{ sigma _mathrm{in} left[ pi _mathbf {q}(i) – pi _mathbf {p}(i) right] right} } end{aligned}$$

(9)

$$begin{aligned} T_i^{-}&= frac{N-i}{N}frac{i}{N-1} frac{1}{1+ exp left{ sigma _mathrm{in} left[ pi _mathbf {p}(i) – pi _mathbf {q}(i) right] right} }. end{aligned}$$

(10)

Here, the first equation corresponds to the case where a (mathbf {q})-player is randomly chosen as the updating player, a (mathbf {p})-player is chosen as the role model, and the updating player chooses to imitate the role model. The second equation corresponds to the converse case of a (mathbf {p})-player imitating a role model with strategy (mathbf {q}).

The fixation probability of a single (mathbf {p}) player in a resident group of (mathbf {q})-players can be computed explicitly^{40,41}. This probability is given by (rho _{mathbf {p},mathbf {q}} = big ( 1 + sum _{j=1}^{N-1} prod _{i=1}^j gamma _i big )^{-1}), where (gamma _i equiv T_i^{-} / T_i^{+} = exp left[ sigma _mathrm{in}cdot left( pi _mathbf {q}(i) – pi _mathbf {p}(i) right) right]). By using the explicit payoff equations (8), this fixation probability becomes^{22}

$$begin{aligned} rho _{mathbf {p},mathbf {q}} = left( sum _{i=0}^{N-1} exp left[ sigma _mathrm{in} cdot i frac{ (2N-i-3)pi _{mathbf {q},mathbf {q}} + (i+1)pi _{mathbf {q},mathbf {p}} – (2N-i-1)pi _{mathbf {p},mathbf {q}} – (i-1)pi _{mathbf {p},mathbf {p}}}{2(N-1)} right] right) ^{-1}. end{aligned}$$

(11)

Using this formula, we can compute for each resident strategy (mathbf {q}) how likely it is that any novel strategy (mathbf {p}) is eventually adopted by the entire group. While the use of fixation probabilities has become common practice in evolutionary game theory^{41}, we note that the time it takes for a single strategy to reach fixation may be considerable. The fixation time becomes particularly long when groups are large, and when the strategies (mathbf {p}) and (mathbf {q}) allow for an equilibrium in which the two strategies stably co-exist^{42}. Nevertheless, this limit has become a useful approximation, as it simplifies computations considerably. Instead of considering arbitrarily many strategies at once, one can make predictions by only considering two strategies at a time^{43,44}. Once a strict separation of time scales does no longer apply, the analysis becomes considerably more intricate^{45}.

#### Description of the inter-group dynamics.

To further simplify the analysis of our model, we make the additional assumption that (nu ll mu _mathrm{out}). This limit indicates that the time scale for out-group imitations is short compared to the time scale of mutations. This assumption implies that at any point in time, at most two different strategies are present in the entire population. Once a mutation introduces a new strategy, this strategy either fixes in the population (through successive in-group and out-group imitation events), or the strategy goes extinct. To describe this dynamics in more detail, suppose that the two strategies (mathbf {p}) and (mathbf {q}) are present in the population. Since intra-group imitation is fast, every group is homogeneous. As a consequence, we can speak of (mathbf {p})-groups and (mathbf {q})-groups, depending on which strategy the group members employ. Once a (mathbf {q})-player imitates a player from a (mathbf {p})-group, the number of (mathbf {p})-groups may increase (if the strategy (mathbf {p}) reaches fixation in the (mathbf {q})-group). The respective probability that the number *i* of (mathbf {p})-groups increases (or decreases) is given by

$$begin{aligned} Q_i^{+}&= frac{i}{M}frac{M-i}{M-1} frac{1}{1+ exp left[ sigma _mathrm{out} left( pi _{mathbf {q},mathbf {q}} – pi _{mathbf {p},mathbf {p}} right) right] } cdot rho _{mathbf {p},mathbf {q}}, end{aligned}$$

(12)

$$begin{aligned} Q_i^{-}&= frac{i}{M}frac{M-i}{M-1} frac{1}{1+ exp left[ sigma _mathrm{out} left( pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} right) right] } cdot rho _{mathbf {q},mathbf {p}}, end{aligned}$$

(13)

respectively. In both expressions, the first three factors on the right hand side represent the probability of the respective out-group imitation event. The last factor is the probability that the newly introduced strategy reaches fixation. The ratio (eta) of these transition probabilities simplifies to

$$begin{aligned} eta equiv frac{Q_i^{-}}{Q_i^{+}} = frac{rho _{mathbf {q},mathbf {p}}}{rho _{mathbf {p},mathbf {q}}} exp left[ sigma _mathrm{out} (pi _{mathbf {q},mathbf {q}} – pi _{mathbf {p},mathbf {p}}) right] . end{aligned}$$

(14)

From Eq. (11), the ratio of the intra-group fixation probabilities is^{46}

$$begin{aligned} frac{rho _{mathbf {p},mathbf {q}}}{rho _{mathbf {q},mathbf {p}}} = prod _{j=1}^{N-1} gamma _j = exp left{ sigma _mathrm{in} left[ Big (pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}}Big ) left( frac{N}{2}-1right) + Big (pi _{mathbf {p},mathbf {q}} – pi _{mathbf {q},mathbf {p}}Big ) frac{N}{2} right] right} . end{aligned}$$

(15)

Thus, Eq. (14) can be re-written as

$$begin{aligned} eta = exp left{ -sigma _mathrm{in} left[ Big (pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} Big ) left( frac{N}{2}-1right) + Big (pi _{mathbf {p},mathbf {q}} – pi _{mathbf {q},mathbf {p}}Big ) frac{N}{2} right] + sigma _mathrm{out} Big (pi _{mathbf {q},mathbf {q}}-pi _{mathbf {p},mathbf {p}}Big ) right} . end{aligned}$$

(16)

Overall, we obtain the following formula for the probability that a new strategy (mathbf {p}) takes over the entire population, given everyone else applies strategy (mathbf {q}),

$$begin{aligned} Psi _{mathbf {p},mathbf {q}} = rho _{mathbf {p},mathbf {q}} frac{1}{1 + sum _{j=1}^{M-1} eta ^{j} } = {left{ begin{array}{ll} rho _{mathbf {p},mathbf {q}} frac{1-eta }{ 1-eta ^M } &{} text {when }eta ne 1 \ rho _{mathbf {p},mathbf {q}} / M &{} text {when }eta = 1, end{array}right. } end{aligned}$$

(17)

Here, the first factor (rho _{mathbf {p},mathbf {q}}) is the probability that the (mathbf {p})-mutant takes over the first group. The second factor gives the probability that eventually also all other groups adopt (mathbf {p}). Similarly one can calculate the fixation probability that everyone in a (mathbf {p})-population eventually imitates a single (mathbf {q}) mutant. This probability is

$$begin{aligned} Psi _{mathbf {q},mathbf {p}} = rho _{mathbf {q},mathbf {p}} frac{1}{1 + sum _{j=1}^{M-1}eta ^{-j} } = {left{ begin{array}{ll} rho _{mathbf {q},mathbf {p}} frac{1-eta ^{-1}}{ 1-eta ^{-M} } &{} text {when }eta ne 1 \ rho _{mathbf {q},mathbf {p}} / M &{} text {when }eta = 1. end{array}right. } end{aligned}$$

(18)

These formula allow us to compute how likely any given mutant strategy is to replace the resident strategy when there is a complete separation of time scales.

#### Strategies favored by the evolutionary process

Using these formulas, we can analyze which strategies are particularly likely to spread. To this end, we say strategy (mathbf {p}) is favored over (mathbf {q}) if a single (mathbf {p})-mutant is more likely to fix in a (mathbf {q})-population than vice versa. By Eqs. (17) and (18), the respective condition (Psi _{mathbf {p},mathbf {q}} > Psi _{mathbf {q},mathbf {p}}) simplifies to

$$begin{aligned} frac{rho _{mathbf {p},mathbf {q}}}{rho _{mathbf {q},mathbf {p}}} > exp left[ frac{M-1}{M} sigma _mathrm{out} Big (pi _{mathbf {q},mathbf {q}}-pi _{mathbf {p},mathbf {p}} Big ) right] . end{aligned}$$

(19)

The left hand side reflects the effect of in-group imitation, whereas the right hand side captures the effect of out-group imitation. In the special case of a single group, (M=1), this condition reproduces the respective condition for well-mixed populations, (rho _{mathbf {p},mathbf {q}} > rho _{mathbf {q},mathbf {p}}). Plugging Eq. (15) into Eq. (19) yields

$$begin{aligned} sigma _mathrm{in} frac{N}{2} pi _{mathbf {p},mathbf {q}} + left[ sigma _mathrm{in} left( frac{N}{2}-1 right) + sigma _mathrm{out} frac{M-1}{M} right] pi _{mathbf {p},mathbf {p}} > sigma _mathrm{in} frac{N}{2} pi _{mathbf {q},mathbf {p}} + left[ sigma _mathrm{in} left( frac{N}{2} – 1 right) + sigma _mathrm{out} frac{M-1}{M} right] pi _{mathbf {q},mathbf {q}}. end{aligned}$$

(20)

By collecting alike terms, this expression can be further simplified to

$$begin{aligned} sigma _mathrm{in} left[ frac{N}{2} big (pi _{mathbf {p},mathbf {q}} -pi _{mathbf {q},mathbf {p}} big ) + left( frac{N}{2}-1 right) big ( pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} big ) right] + sigma _mathrm{out} frac{M-1}{M} big ( pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} big ) > 0. end{aligned}$$

(21)

The first and the second terms of this inequality correspond to the dynamics within and between groups, respectively. The intra-group dynamics is decisive if either (sigma _mathrm{in}N gg sigma _mathrm{out}) or if the number of groups is small ((Mapprox 1)). In that case, and if groups are additionally assumed to be small ((Nrightarrow 2)), the condition for (mathbf {p}) to be favored simplifies to

$$begin{aligned} pi _{mathbf {p},mathbf {q}} > pi _{mathbf {q},mathbf {p}}. end{aligned}$$

(22)

This condition is closely related to the notion of rival strategies^{14}. Strategy (mathbf {p}) is a rival strategy if and only if it enforces the condition (pi _{mathbf {p},mathbf {q}} ge pi _{mathbf {q},mathbf {p}}) against all strategies (mathbf {q}). In Table 1, the second-to-last column indicates all memory-1 rival strategies. There are four of them, TFT, Grim, AllD, and the strategy (mathbf {p}=(0,0,1,0)). The above observations suggest that these rival strategies should be particularly strong when there is only a single group with two group members ((M=1) and (N=2)).

In the other extreme, when (sigma _mathrm{in}N ll sigma _mathrm{out}) and *M* is sufficiently large, it is the inter-group dynamics that is decisive. In that case, the relative strength of a strategy is determined by its efficiency. Strategy (mathbf {p}) is favored over (mathbf {q}) if and only if it yields the larger payoff against itself,

$$begin{aligned} pi _{mathbf {p},mathbf {p}} > pi _{mathbf {q},mathbf {q}}. end{aligned}$$

(23)

A final interesting case arises when the two selection strength parameters are equal, (sigma _mathrm{in} = sigma _mathrm{out}). In that case, condition (21) simplifies to

$$begin{aligned} MNbig (pi _{mathbf {p},mathbf {q}} -pi _{mathbf {q},mathbf {p}} big ) + (MN-2) big ( pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} big ) > 0. end{aligned}$$

(24)

In particular, if the total population becomes large (MNrightarrow infty), strategy (mathbf {p}) is favored if and only if

$$begin{aligned} pi _{mathbf {p},mathbf {p}} +pi _{mathbf {p},mathbf {q}} – pi _{mathbf {q},mathbf {p}} – pi _{mathbf {q},mathbf {q}} > 0. end{aligned}$$

(25)

That is, (mathbf {p}) is favored if and only if it is risk-dominant^{47}, independent of the exact values of *M* and *N*.

Equation (24) indicates that the preference between two strategies is independent of *N* and *M* as long as the total population size *MN* is fixed. We note, however, that the extent to which (mathbf {p}) is preferred over (mathbf {q}) does depend on *M* and *N*. Thus, the relative abundance of the strategies changes according to the group structure when there are more than two strategies, as we will see in the following numerical simulations.

**Numerical simulations.** The above arguments are valid only when players choose among two strategies. In the following, we explore evolution among all 16 deterministic memory-1 strategies by implementing the evolutionary process numerically. To this end, we use Monte Carlo simulations. Mutant strategies are repeatedly introduced into the current resident population. The mutant strategy either takes over or goes extinct. We report how much cooperation we observe on average (see ““Methods”).

Figure 2 shows how the evolving cooperation level depends on the number of groups *M*, either for small groups ((N=2)) or for relatively large groups ((N=32)). When the group size is small, we observe very little cooperation if there is only a single group ((M=1)), as predicted by our earlier analysis. As we increase the number of groups, also the cooperation level increases. However, they do not improve indefinitely. Rather, these improvements saturate as we increase *M*, which is consistent with the factor ((M-1)/M) in Eq. (20). The limiting cooperation level depends on the benefit of cooperation, reproducing the standard result that larger benefits are more conducive to cooperation^{7}. In general, we thus observe that cooperation tends to be favored when *M*, *N*, and *b* are large, corresponding to many groups of substantial size, and a considerable benefit to cooperation.

After exploring the effect of group size and number of groups in isolation, we next ask to which extent group structure facilitates cooperation. To this end, we keep the total population size fixed at (MN=120), and vary the group size *N*. The number of groups is then automatically determined as (M=120/N). In one extreme case, there is only a single group of maximum size, (M=1) and (N=120). We refer to this scenario as the case of a (single) well-mixed population. In the other extreme case, groups take the minimum non-trivial size, (N=2), which implies that the resulting number of groups is (M=60). We refer to this second scenario as the case of a (fully) group-structured population. Figure 3a shows how the cooperation level changes as we vary the group size *N*. Interestingly, the effect of group structure depends on the benefit *b* of cooperation. If the benefit is small, group-structured populations achieve more cooperation than well-mixed populations. For intermediate benefits, we observe the opposite trend. Here, well-mixed populations are more conducive to cooperation. Finally, once benefits are very large, full cooperation evolves in all considered cases, independent of the exact values of *M* and *N*.

To further investigate these non-trivial effects of group structure, we analyze the abundance of each of the 16 strategies in the stationary state, see Fig. 3b–e. We first consider the case that the benefit of cooperation is intermediate, (b!=!3). Here, well-mixed populations lead to much more cooperation. In particular, here we observe that populations learn to adopt the cooperative Win-Stay Lose-Shift (WSLS) strategy almost all of the time (Fig. 3c). In group-structured populations, on the other hand, no single strategy is predominant. The most abundant strategies are the non-cooperative strategies AllD and Grim. The next abundant strategies are WSLS and TFT, respectively. Overall, we thus observe that well-mixed populations are more favorable for cooperation because they make it more likely that the cooperative strategy WSLS evolves. In a second step, we consider the case of low cooperation benefits, see Fig. 3d,e. Group-structured populations again lead to the evolution of the strategies AllD, Grim, TFT, and WSLS (and related strategies). In contrast, well-mixed populations consist of the non-cooperative strategies AllD and Grim almost entirely.

To better understand why cooperative strategies are abundant in one scenario but not in another, we investigate the transition probabilities for a reduced strategy space. The reduced strategy space contains the representative strategies AllC, WSLS, TFT, AllD and the strategy (S_7) with (mathbf {p}=(0,0,0,1)). Strategy (S_7) is included because it has the highest ability to invade WSLS among the memory-1 strategies. Strategy (S_7) is preferred over WSLS for the broadest range of *b* according to Eq. (25). The payoffs and the win-lose relationships of these strategies are summarized in Tables 2 and 3. In addition, Fig. 4 illustrates the average abundance of each of these five strategies and the transition probabilities between them. We confirmed that the overall cooperation levels for this reduced strategy space are comparable to the corresponding results for the full strategy space. Hence, the reduced strategy space can serve a useful proxy to gain insights into the overall dynamics.

We first consider the case of an intermediate benefit, (b!=!3). Here, well-mixed populations yield more cooperation, as they promote the evolution of WSLS. Figure 4b shows why. There is an evolutionary path from every other strategy towards WSLS; once the entire population adopts WSLS, every other mutant strategy is at a disadvantage. This picture is in line with previous research on direct reciprocity in well-mixed populations^{31}. The picture changes, however, in group-structured populations, see Fig. 4a depicting the case of groups of size (N!=!2). Here, a homogeneous WSLS population can be invaded by (S_7). To see why, consider a group that contains both strategies. By Table 2, the payoff of WSLS is (1/3b-2/3), which is below the payoff of (S_7), (2/3b-1/3). Hence, (S_7) is favored in each mixed group. On the other hand, with respect to out-group imitation, it is WSLS that is favored over (S_7), because the payoff of WSLS against itself is (b!-!1), which exceeds (S_7)’s self-payoff of ((b-1)/2). To compute which of the two opposing effects dominates, Eq. (21) suggests that we need to compute the sign of (pi _{mathbf {p},mathbf {p}} +pi _{mathbf {p},mathbf {q}} – pi _{mathbf {q},mathbf {p}} – pi _{mathbf {q},mathbf {q}}). For (b<5), this criterion suggests that (S_7) is favored (as also indicated in Table 3). These observations explain why in group-structured populations, WSLS is susceptible to invasion by (S_7), which in turn can be invaded by AllD.

In a next step, we explore the case of a small benefit of cooperation, (b!=!1.5). Here, group-structured populations are more cooperative. The respective transition graphs for group-structured and well-mixed populations are depicted in Fig. 4c, d. In both cases, we observe that there is no single strategy that resists invasion by all other strategies. Instead, AllD populations are susceptible to TFT, which in turn is susceptible to AllC and WSLS, which can be invaded by AllD again. The main difference between group-structured and well-mixed populations is the relative performance of TFT. Compared to well-mixed populations, TFT is better able to invade AllD populations in structured populations. To see why, we first consider the within-group dynamics when (N=2). Because TFT gets the same payoff as the opponent in any pairwise encounter^{15}, the fixation probability of TFT in a group with ALLD is exactly 1/2. In addition, TFT is favored by the between-group dynamics, because the payoff of TFT-groups is (b − c)/2, which is larger than the payoff of zero in AllD-groups. It follows that a single TFT mutant can replace an AllD population with a probability that is approximately 1/2. In contrast, in well-mixed populations, this fixation probability is much smaller, 0.18 for the parameters in Fig. 4d.

The above results suggest that overall, there are two competing effects when splitting a population into smaller groups. On the one hand, smaller group sizes favor the evolution of rival strategies because small groups generally select for spite^{40}. On the other hand, group structure can favor the evolution of cooperation because individuals in highly cooperative groups are more likely imitated. Our above results suggest that the overall outcome of these two opposing effects depends on the benefit of cooperation. When this benefit is comparably small, group-structured populations allow for more cooperation than well-mixed populations. In contrast, when this benefit is intermediate, cooperation in well-mixed populations is more robust.

### Dynamics when there is a partial separation of time scales

Throughout our analysis so far we have assumed a complete separation of time-scales. When a player was randomly chosen to update its strategy, we assumed that this player is most likely to engage in intra-group imitation, far less likely to engage in out-group imitation, and again far less likely in random exploration (mutation). In the following, we instead assume that intra-group comparisons are still most likely; however, mutations and out-group comparisons now occur on a similar time scale. In this limit, all groups can be assumed to be homogeneous because intra-group imitation is fast. However, different groups might employ different strategies, because mutations might introduce novel strategies faster than out-group imitation can result in the fixation of any given strategy in the population.

#### A differential equation in the limit of large populations

To obtain analytical results, in the following we assume that the number of groups is large, (Mrightarrow infty). Let (x_mathbf {p}) be the fraction of groups that employ strategy (mathbf {p}). Over time, these fractions can change, either because new strategies are introduced into groups by out-group imitation (and reach fixation), or they are introduced by mutations (and reach fixation). This dynamics may be described by the following differential equation,

$$begin{aligned} dot{x}_mathbf {p} = left( 1- r right) sum _{mathbf {q}ne mathbf {p}} alpha _{mathbf {p},mathbf {q}} x_mathbf {p} x_mathbf {q} + rsum _{mathbf {q}ne mathbf {p}}frac{x_mathbf {q}cdot rho _{mathbf {p},mathbf {q}} – x_mathbf {p}cdot rho _{mathbf {q},mathbf {p}}}{|mathcal {M}|}. end{aligned}$$

(26)

Here, (r=nu /(nu +mu _mathrm{out})) is the relative mutation probability (compared to out-group imitation events). The right hand side of Eq. (26) consists of two parts. The first sum describes the changes triggered by out-group imitation. Here, the parameter

$$begin{aligned} alpha _{mathbf {p},mathbf {q}} equiv frac{rho _{mathbf {p},mathbf {q}}}{ 1 + exp left[ sigma _mathrm{out} left( pi _{mathbf {q},mathbf {q}} – pi _{mathbf {p},mathbf {p}} right) right] } – frac{rho _{mathbf {q},mathbf {p}}}{ 1 + exp left[ sigma _mathrm{out} left( pi _{mathbf {p},mathbf {p}} – pi _{mathbf {q},mathbf {q}} right) right] } end{aligned}$$

(27)

describe the flow from strategy (mathbf {q}) to strategy (mathbf {p}). For example, the denominator of the first term on the right hand side describes the likelihood that a (mathbf {q})-player switches to (mathbf {p}) due to out-group imitation. The numerator describes the likelihood that subsequently, (mathbf {p}) reaches fixation due to in-group imitation. The interpretation of the second term on the right hand side is similar, by considering the possibility that a (mathbf {p})-group makes the converse transition towards (mathbf {q}). The second sum in Eq. (26) describes the changes triggered by mutation events. Here, the denominator of (rho _{mathbf {p},mathbf {q}}/|mathcal {M}|) describes the probability that the mutating player adopts strategy (mathbf {p}). The numerator gives the probability that this strategy is then adopted by all other group members, due to intra-group imitation. We note that the sum (sum _mathbf {p} x_mathbf {p}=1) by definition, and hence the equation is defined on the 16-dimensional simplex. Moreover, since (sum _mathbf {p}{dot{x}_mathbf {p}} = 0), the unit simplex is invariant under the dynamics. One may interpret Eq. (26) as a variant of the replicator–mutator equation^{48}, where the first part represents selection and the second part represents mutations.

Further below, we explore the solutions of Eq. (26) numerically, for various parameter combinations. For all parameters we considered, the dynamics converges to a stable fixed point. Such a fixed point satisfies the equation

$$begin{aligned} x_mathbf {p}^{*} = frac{rcdot sum _{mathbf {q}ne mathbf {p}}rho _{mathbf {p},mathbf {q}} , x_mathbf {q}^{*}/|mathcal {M}| }{(1-r) sum _{mathbf {q}ne mathbf {p}} alpha _{mathbf {p},mathbf {q}} x_mathbf {q}^{*} – rsum _{mathbf {q}ne mathbf {p}}rho _{mathbf {q},mathbf {p}}/|mathcal {M}|}. end{aligned}$$

(28)

We would like to emphasize that the Eq. (26) does not need to recover the qualitative dynamics that we obtained in the previous section, even when (rrightarrow 0) (in which case mutations are again rare compared to out-group imitation events). In other words, the order in which limits are taken affects the solution that is predicted. As we show further below, however, the solutions predicted by Eq. (26) are in excellent agreement with explicit simulations of the evolutionary process for all values of *r* we considered.

#### Numerical results

Figure 5a shows the evolving cooperation levels for a well-mixed population ((N!=!200, M!=!1)) and a group-structured population ((N!=!2), (M!=!100)). We observe a striking difference between the two settings. In the well-mixed population, the cooperation level strongly depends on the benefit of cooperation, as one may expect. For small benefit values, hardly any cooperation evolves. For intermediate and large benefit values, almost the entire population cooperates eventually. In contrast, in the group-structured population, cooperation levels are around 1/2 when *r* is low, largely independent of the benefit *b*. For (r gtrsim O(10^{-1})), the cooperation levels drop as *r* increases, as shown in Fig. 5b.

To explore these results for group-structured populations in more detail, Fig. 6a shows the abundance of strategies in the selection-mutation equilibrium for (b=3). According to this figure, the most abundant strategy is *WSLS*, followed by (S_7), Grim, AllD, and (S_{13}) (the latter four strategies are exactly the strategies that have an advantage when directly competing with WSLS, see right-most column of Table 1). The underlying evolutionary dynamics are schematically depicted in Fig. 6b. Individuals in groups with non-cooperative strategies (such as Grim and ALLD) tend to adopt more cooperative strategies like TFT by out-group imitation. Once such groups contain a TFT-player, TFT may reach fixation by intra-group imitation (TFT is neutral when there is only a single TFT player in the group, and it is selectively favored when there are two TFT players or more). TFT-groups in turn are easily replaced by strategies that are more cooperative in the presence of errors, such as AllC and WSLS. WSLS groups are comparably stable; as they reach the maximum payoff against themselves, individuals in these groups are unlikely to learn non-cooperative strategies by out-group imitation. However, strategies like AllD and Grim may invade a group of WSLS players once they are introduced by mutations. Assuming that the group is small (the figure depicts the case of (N=2)), AllD and Grim are both likely to take over, thereby closing the evolutionary cycle. Importantly, the above arguments do not depend on the precise value of *b*; they only depend on the win-lose relationships between strategies. This argument can thus explain why we observe a coexistence between WSLS and non-cooperative strategies for a wide range of benefit values.

The above argument also explains the dependency of cooperation levels on *r*. When *r* is sufficiently small, the abundance of TFT also becomes as small as *O*(*r*): the transitions between WSLS and the defectors thus balance, and their abundance as well as cooperation levels do not show significant change as *r* varies. However, because the flow from WSLS to the defectors are mainly driven by the mutation events, the abundance of WSLS players drops as mutation events get more frequent.

When *r* or *M* is even smaller, evolutionary results become closer to the results observed when there is a complete separation of time scales. The crossover point depends on the strength of selection. When selection strengths are sufficiently weak, the evolutionary dynamics get closer to neutral selection, where fixation times are relatively short. Here, either WSLS or AllD may happen to take over the entire population with a non-negligible frequency, and the evolutionary dynamics are better described by the complete separation of time scale.