A quantitative meditation on eclipse attacks

Abstract

This article investigates the susceptibility of the Bitcoin P2P network to passive eclipse attacks where unsuspecting nodes end up with outbound connections exclusively to peers controlled by an attacker. Using empirical data on reachable clearnet nodes, statistical analysis reveals the shortcomings of netgroup bucketing: although more than 90% of nodes are estimated to be connected to eight or more unique ASN, less 40% of nodes are connected to ten of them. After proposing and validating an analytic method for modeling eclipse probabilities for both the netgroup bucketing and asmap policies, the benefits of asmap in terms of increased attacker costs are estimated between 25-180%. The key finding, however, is that peer selection policy impact is mostly theoretical in the context of passive attacks because of the enormous number of attacker nodes required to carry out such attacks: even without netgroup bucketing or asmap, eclipsing just 10% of the network would require 36,000 nodes—more than four times the number of currently reachable clearnet nodes. To eclipse more than half of the network, node numbers required by an attacker are estimated to range in the hundreds of thousands to millions.

Don’t trust, verify: Reviewing the methodology and independently reproducing the results presented here is strongly encouraged. To this end, all Jupyter notebooks used to carry out this research have been published on GitHub along with all relevant input data. Feel free to reuse the code for any open source follow-up research.

Empirical Nuggets

According to data collected early May 2023, the Bitcoin network then comprised roughly 19,000 reachable addresses, of which 6,688 were IPv4 and 1,726 were IPv6 addresses. The roughly 10,500 Onion and 500 I2P addresses are neglected in all following analysis because they cannot be assigned into netgroups or ASN.

Bitcoin Core’s current netgroup policy assigns the 8,414 IP addresses into 4,102 netgroup buckets (3,680 IPv4 buckets and 422 IPv6 ones). An asmap generated with Kartograf in May 2023 results in 1,171 ASN buckets (1,062 for IPv4 and 361 for IPv6 addresses; the total bucket sum is less than the sum of the IPv4 and IPv6 buckets because an ASN may comprise IPv4 and IPv6 addresses). Already at this high level, the netgroup-asmap bucket ratio of roughly four gives a hint of the inadequacy of netgroup bucketing.

The shortcoming’s cause becomes apparent when examining the data in more detail. The figure below compares the compares the number of addresses in the thirty largest buckets for both bucketing strategies. Not unexpectedly, considering the cloud computing winner-take-all economics at play, both distributions follow a power law. Overall, the data paint a picture of few ASN (cloud providers) dominating the node landscape.

Fat Heads

Assuming netgroup policy with the default number of ten outbound connections, a quick glance at the data to compare the top ASN (1,000 nodes) to the top ten netgroups (less than 900 nodes combined) is sufficient to proof that a node connected to peers in any ten netgroups might in fact be connected to peers from a single ASN. This possibility notwithstanding, its likelihood is practically zero, as will be shown below.

However, the high kurtosis of the overall ASN distribution, highlighted in the figure below, increases the likelihood of randomly selected netgroups coming from a limited number of ASN. Some numbers: just three ASN provide more than 25% of nodes and the top 1% of ASN provide more than 50% of nodes, whereas more than 50% of ASN provide only a single node, making for a combined contribution of less than 8%. So, while it may be unlikely for any ten netgroups to map to a single ASN, it seems likely that ten randomly selected netgroups map to less than ten ASN due to the distribution’s fat head and long tail.

Long Tails

For far, the analysis has been qualitative, but a quantitative analysis of the shortcomings of netgroup bucketing is possible using Monte Carlo simulation, in which a node randomly selects ten peers in line with Bitcoin Core’s current netgroup bucketing policy. Afterwards, the peers addresses are mapped to ASN, and the resulting number of unique ASN is recorded. The outcome of such a simulation with 100,000 iterations is shown in the figure below, which presents the resulting ASN distribution for nodes with different network types.

Netgroup-ASN Security Reduction

The simulation suggests less than 40% of IPv4-only and IPv4/IPv6 nodes and less than 30% of IPv6-only nodes have outbound connections to ten different ASN. More than 90% of nodes, however, are connected to at least eight unique ASN. Interpretation as to whether this situation is acceptable is deferred to experts on the matter. Nonetheless, the data provide a better understand to the lay of Bitcoin land in terms of nodes, netgroups and ASN. Equipped with this empirical data, the next section tackles the question of how to derive meaningful estimates for eclipse probabilities from it.

Modeling eclipse probabilities

First insight: Netgroup selection can be modeled using a multivariate noncentral Wallenius hypergeometric distribution. To see why this is the case, consider the following line of thought, beginning with the standard urn model that gets adjusted as needed along the way: The standard urn model distinguishes balls of only two colors, which is insufficient to differentiate between thousands of nodes, netgroups or ASN. The multivariate urn model, however, caters to cases requiring an arbitrary number of colors. With such a model, one could imagine mapping the process of a Bitcoin node randomly selecting peers for outbound connections to drawing from an urn where balls correspond to nodes: each peer corresponds to a ball of a different color and there is exactly one ball per color. Taking netgroup bucking into account, where nodes discard randomly selected peers in subnets to which they are already connected, makes things more complicated. Modeling this process presents the difficulty of having to remove all balls from the urn that are in the same netgroup as the node corresponding to a ball that was drawn. Fortunately, the issue can be resolved when making balls represent netgroups instead of nodes. This, however, has the unintended side effect of skewing probabilities: in the standard urn model, balls have equal probabilities of getting drawn but in the peer selection context, the probability of selecting a netgroup depends on the number of nodes in it. Opportunely, noncentral models allow specifying a weight for each ball, setting the drawing probability for each ball to the ratio of its weight to the combined weight of all balls. To summarize, the peer selection process can be mapped onto a variant of the urn model in which balls of different colors represent netgroups whose weights correspond to the number of nodes in the netgroup they represent. Such an urn model is described by the multivariate noncentral Wallenius hypergeometric distribution.

Second insight: Eclipse probabilities can also be modeled using a multivariate noncentral Wallenius hypergeometric distribution. Assuming no eclipse attack is currently underway on the network, the eclipse probability of an attacker controlling a particular amount of nodes and netgroups is determined as follows. To maximize the probability of an eclipse attack, attacker nodes are evenly distributed across netgroups, implying identical node counts across netgroups. Through the lens of the urn model, the balls corresponding to attacker netgroups become indistinguishable because they have identical weights. This is crucial because it allows assigning a single color, say red, to the attacker’s balls. As before, the other balls in the urn correspond to the netgroups of existing reachable Bitcoin nodes, each with a different weight and color. In such a setup, the probability of a successful eclipse attack corresponds to the drawing of ten red balls.

A demonstration of the proposed approach is given in the figure below, which shows eclipse attack probabilities depending on the number of netgroups controlled by an attacker for the two scenarios of an attacker controlling 20,000 and 30,000 nodes. The solid lines correspond to probabilities obtained with the proposed method. The cross marks, representing probabilities obtained by Monte Carlo simulation, lend credibility to the correctness of the proposed method.

Model validation

Third insight: eclipse probabilities do not depend on the distribution of benign nodes. This becomes evident upon closer examination. The probability of a successful eclipse attack depends on the probabilities of all successive random samples corresponding to attacker netgroups. Because no ball corresponding to a benign netgroup is drawn, the weight contributed by benign nodes to the overall weight used in the ratio determining the balls’ probabilities is constant throughout sampling. If only attacker netgroups are drawn, all balls’ probabilities are recomputed by removing the drawn attacker netgroup’s weight from the overall weight after each draw. Put formally: If, to begin with, there are \(n_\mathrm{b}\) benign netgroups with individual weights of \(w_i\) nodes, and \(n_\mathrm{a}\) attacker netgroups with identical weights \(w_\mathrm{a} = n_\mathrm{n} / n_\mathrm{a}\), where \(n_\mathrm{n}\) is the number of nodes controller by the attacker, the probability of initially drawing a particular attacker netgroup is \(\frac{w_\mathrm{a}}{ c + n_\mathrm{a}w_\mathrm{a}}\), with \(c = \sum_{i=1}^{n_\mathrm{b}}w_\mathrm{i}\) the sum of the weights contributed by benign netgroups. Because each attacker netgroup has the same weight, the probability of initially drawing any attacker netgroup is \(\frac{n_\mathrm{a} w_\mathrm{a}}{ c + n_\mathrm{a}w_\mathrm{a}}\). After sampling an attacker netgroup, ball probabilities are adjusted by removing the weight of the drawn netgroup, \(w_\mathrm{a}\), from the overall weight. Thus, after drawing \(i\) attacker netgroups, \(n_\mathrm{a} - i\) attacker netgroups are left and the overall weight has been reduced by \(i w_\mathrm{a}\), putting the probability of drawing an attacker netgroup after the \(i\)th round at \(\frac{(n_\mathrm{a} - i) w_\mathrm{a}}{c+ (n_\mathrm{a} - i) w_\mathrm{a}}\). This makes plain the fact that \(c\) is constant, and all that matters is the overall number of nodes contributed by benign netgroups, not their distribution.

Corollary (engineering side note): The fact that eclipse probabilities are independent of benign node distribution allows modifying the distribution as long as the number of nodes remains constant. Taking the distribution to its limit by assigning all benign nodes to a single bucket significantly reduces complexity for the Wallenius solver and its runtime, making large-scale parameter studies of eclipse probabilities feasible.

Final insight: All bucketing-based node selection policies can be modeled with the same multivariate Wallenius noncentral hypergeometric distribution. In netgroup bucketing, nodes are assigned to different buckets based on their subnets; with asmap, based on the ASN the node belongs to. Although the distribution of nodes into buckets is entirely different, the number of nodes getting distributed in both instances is the same. Consequently, the probability of an attacker controlling \(n_\mathrm{a}\) netgroups and \(n_\mathrm{n}\) nodes successfully eclipsing a node using the netgroup policy is the same as the probability of eclipsing an asmap-policy node when attacking with \(n_\mathrm{a}\) ASN and \(n_\mathrm{n}\) nodes.

Eclipse attack analysis

Integrating empirical data and modeling insights, the figure below visualizes the resources required by an attacker to attain certain eclipse probabilities. The number of nodes controlled by an attacker is shown on the x axis. The number of buckets, which corresponds to netgroups or ASN controlled by an attacker, is shown in the y axis. The different graphs correspond to different probabilities of eclipsing a node; note that if attacker nodes are set up to accept a large number of incoming connections, the probability also corresponds to the share of the Bitcoin network eclipsed by the attacker.

IPv4/IPv6 Eclipse Attack

It turns out that eclipsing just 10% of the network already requires 36,000 nodes (more than four times the current number of reachable IPv4 and IPv6 nodes), as well as access to 50 netgroups. The former is a question of money; the latter is not a problem either, considering popular cloud providers such as Hetzner (AS16509), Amazon (AS16509) and Google (AS396982) each control several hundred netgroups in their autonomous systems. Access to 50 distinct ASN, on the other hand, seems more difficult. Unless that attacker has access to a large botnet, an offhand estimate could be to assume access to between ten and twenty ASN (corresponding to distributing the attack across ten to twenty cloud providers). If an attacker had access to twenty ASN, he would need to launch an extra 10,000 nodes for a total of 46,000 nodes, which comes with a cost increase of around 30%. Assuming instead access to only ten ASN drives up the total number of required nodes to 91,000, a cost increase of about 150%. The data paint a similar picture for higher eclipse attack probabilities, although the cost penalty in these cases appears strictly theoretical, considering the absurdly large attacker node count required to carry out these attacks.

Instead of going through cost increase numbers one by one in the chart above, consider the figure below, which attempts to identify the trend instituted by asmap. It shows the extra cost (in terms of additional attacker nodes) incurred by asmap depending on the number of ASN an attacker has access to and the eclipse probability targeted by the attacker.

IPv4/IPv6 Eclipse Attack Cost Increase

For each ASN count, the data suggest an asymptotic but qualitatively similar cost increases. Even in disruptive scenarios of an attacker eclipsing 75% or 90% of the network, the asmap-induced cost penalty is only around 30% and 50% if the attacker has access to 20 or 15 ASN. In case of 10 ASN, the cost becomes around 180%.

Summary

Empirical data suggests that as of May 2023 the Bitcoin P2P network is reachable via around 8,500 IP addresses. Viewed through netgroup bucketing or asmap lenses, the vast majority of addresses is concentrated in only a handful of subnets or ASN.

Although shortcomings of netgroup bucketing to accurately represent the ASN-based network reality were discovered, statistical analysis assuming the default of ten outbound connections revealed that more than 90% of IPv4/IPv6 nodes are connected to more than seven unique ASN.

The multivariate noncentral Wallenius hypergeometric distribution was established as a means to efficiently carry out parameter studies of eclipse probabilities for both netgroup bucketing and asmap peer policies. Analysis based on this approach yielded several unexpected insights.

First and foremost, even without a peer selection policy, the large number of reachable Bitcoin nodes practically precludes any large-scale eclipse attacks: eclipsing 10% of the network would require 36,000 nodes; for 30%, 71k nodes would be needed; and to eclipse more than 90% of the network takes more than 800k nodes.

Second, the additional requirement imposed by netgroup bucketing seems to not translate into a real world cost increase because the number of netgroups an attacker needs to control to carry out a successful eclipse attack is well below the number of netgroups controlled by individual cloud hosters.

Third, the cost increase of eclipse attacks under an asmap regime was quantified as a function of the number of attacker-controlled ASN. If the attacker has access to ten ASN, cost increases by 180%. The cost increase quickly tapers off, though, if the attacker has access to more ASN: spreading an attack across 15 ASN, the cost increase is only around 50%; assuming access to 20 ASN, costs increase only 30%.

Conclusion

It is hard to misread the data: Requirements imposed by 8,500 honest Bitcoin nodes reachable via IPv4 and IPv6: hundreds of thousands to millions of attacker nodes. Any additional requirement imposed on top of that by a peer selection policy could justifiably be construed as strictly hypothetical: an attacker marshaling resources to field a six-digit number of nodes can surely spread them across some couple of netgroups; or, in case of asmap, use a botnet or spread them across several cloud hosters and create some 30% extra nodes. So… game, set, match: nodes. At least when it comes to network-wide eclipse resistance.

However, it is too soon to hand out the premature optimization award to the runner-up: asmap has demonstrated the ability to increase attacker cost beyond what is possible with netgroup bucketing; it would be naive to dismiss it based on a one-sided analysis focusing on passive network-wide attacks, when asmap might turn out be a boon for mitigating active eclipse attacks targeting individual nodes.

Ack

Supported by , thanks guys!