Reimagining Ethereum staking node architecture to improve performance and reliability

June 20, 2024

TL;DR: Coinbase has implemented shared beacon nodes in its non-custodial Ethereum staking architecture, which eliminates significant unnecessary redundancy, simplifies operations, and reduces validator downtime — which ultimately benefits the entire Ethereum ecosystem.

Setting the stage: The anatomy of an Ethereum staking node

Running an Ethereum staking node requires three pieces of software: a beacon node (i.e., consensus client), an execution client, and a validator client.

Beacon node: The beacon node plays a key role in the Ethereum network’s proof-of-stake consensus mechanism. Its job is to reach consensus with the other nodes on critical events, such as which validator is proposing the next block, which validator got slashed, etc. When a validator client is selected to propose a new block, it’s the beacon node that prepares and ultimately submits the block that the validator signs.
Execution client: The execution client is primarily responsible for managing and storing the "state" of the Ethereum blockchain, which includes smart contract data and code, the state of each account, and the result of each transaction. It validates the integrity of this state by processing transactions (including smart contract executions) and it provides the execution layer data used in block proposals when requested.
Validator client: The validator client is responsible for taking actions associated with the validators’ staked ETH. It queries the beacon node to determine when it needs to attest to a block and whether it needs to propose a block or participate in a sync committee. When it does, it retrieves the information it needs from the beacon node before performing its duties and submitting them back to the beacon node. The validator client can use any synced beacon node it trusts (over a secure connection) to retrieve and submit its duties. It can even connect to multiple beacon nodes for redundancy (keep this in mind — we’ll come back to this point in more detail later).

In a staking node, the beacon node and execution client come as a pair. They peer with other nodes on the network, stay synced, and provide the validator client a trusted interface to the beacon chain. For this reason, references to a “beacon node” throughout this post are inclusive of its paired execution client.

Traditionally, staking nodes have been self-contained; that is, each comprised a single validator client with a dedicated beacon node-execution client pairing. We’ll call this a 1:1:1 configuration. It looks like this:

Example of a region with 33 staking nodes using the 1:1:1 configuration

Scale advantage: Overhauling staking node architecture for peak performance

A single staking node can support multiple validators, each of which represents a stake of 32 ETH. Coinbase made a strategic decision to limit the number of validators we load onto any single node. While this increases infrastructure costs, it limits the potential impact of a hardware failure, as well as reduces overall validator downtime if the software needs to restart for any reason, such as when adding a new validator key.

To many staking providers, the 1:1:1 configuration makes sense. But at our scale, we realized the traditional architecture had become unnecessarily redundant, requiring a large amount of infrastructure that’s expensive to maintain.

We set out to design a more efficient and reliable way to organize our staking infrastructure.

Transitioning to a one-to-many configuration

Remember when we said a validator client can connect to multiple beacon nodes for redundancy? Well, conversely, a beacon node can also simultaneously serve multiple validator clients.

As such, a 1:1:1 configuration is unnecessary. Instead of a beacon node serving a single validator client, we paired dozens of validator clients to a single, souped-up beacon node. This change has allowed us to maintain our scale while reducing the number of beacon node-execution client pairs we need to maintain, significantly reducing our operational overhead.

Our new one-to-many configuration also increases the resiliency and reliability of our staking operations. By freeing up beacon nodes that no longer need to be paired with a single validator client, we’re now able to provide each validator client with a backup beacon node — a strategic benefit that’s cost-prohibitive when using the traditional node architecture. This redundancy is good for the entire Ethereum ecosystem because it reduces total validator downtime. If a primary beacon node goes offline, the validator clients it serves will seamlessly switch to the backups.

Comparison of a region (left) with 15 nodes deployed in the conventional manner and, on the right, Coinbase’s new one-to-many configuration, which requires less infrastructure while increasing each validator node’s resiliency by providing its own backup beacon node.

Assessing the risks of running shared beacon nodes

In considering this change, we asked a lot of questions. Some of the most important ones were:

Can beacon nodes handle connections from many validator clients without degrading performance?

In our performance tests, we found a beacon node whose resources are scaled up appropriately can handle connections with many validator clients with no impact on performance.

Does the increased cost of running a more powerful beacon node, as well as providing a backup node, offset the savings accrued from sharing beacon nodes among validator clients?

The degree to which a beacon node needs to be scaled is not proportional to the number of connected validator clients, so the new configuration does reduce overall costs.

Fostering client diversity, geographical distribution, and redundancy still requires maintaining a significant number of shared beacons. Sharing beacons while satisfying these requirements only makes sense at a large scale.

Does connecting a validator client to two beacon nodes simultaneously for redundancy introduce any new problems while switching, such as missed attestations or block proposals?

The validator-client implementations we use are designed to seamlessly switch beacon nodes without missing anything. If a duty, such as a block proposal or sync-committee participation, requires notifying the beacon node ahead of time, our validator clients notify each connected beacon node. This ensures both are ready in case something were to happen to the primary node. Since the signing keys are on the validator client and the beacon node is only used to broadcast the transaction once signed, this does not introduce any additional risk of double-signing.

Tactical considerations to implementation

Sharing beacon nodes in production requires operational changes in order to meet our team’s high standard for security, reliability, and performance.

Redundancy

Sharing a beacon node among multiple validator clients multiplies the number of affected validators in the event of an individual beacon node failing. To mitigate this risk, we deploy a backup for every shared beacon node and utilize client-side load balancing to fall back to the backup node in the event of a failure.

Several measures are taken to reduce the likelihood that both beacon nodes will fail at the same time, such as preventing them from being scheduled on the same machine and performing maintenance at different times.

Our backup beacon nodes are identical to our primary beacon nodes, and can handle the same number of connected validator clients. In addition to reducing risk, this reduces downtime and simplifies operations by allowing validator clients to keep running when we perform maintenance on our beacon nodes.

Observability

Arguably the most important elements of any production system are its metrics, dashboards, alerts, and on-call procedures. Given the increased impact of an individual beacon node failure, our engineering team dedicated substantial effort to ensuring incidents are identified and handled as quickly as possible. The key addition to our existing observability and incident handling stack is tooling that allows us to easily identify when an issue originates from a shared beacon node, and just as importantly, which shared beacon node is responsible.

Multiple configurations

We give our customers a choice of validator client implementation, whether or not to use maximal extractable value (MEV), and whether the validator participates in mainnet or testnet consensus. We run a shared beacon node with each implementation, and only connect validator clients with the same implementation to it. By doing so we foster client diversity for beacon nodes and validator clients, we honor the customer's request for a particular implementation, and we avoid issues that arise from mixing implementations. As a result, we operate several beacon nodes and their backups in each region.

Locality

As a security measure, we do not expose our beacon API endpoints to the internet, and we do not route beacon-validator connections through the internet. Keeping our beacon API traffic inside our networks also minimizes latency, which is critical for maximizing rewards. This design choice, which results in needing to deploy beacon nodes in each region, also limits the impact of a cloud provider outage or the failure of an individual beacon node.

Securing our future

By rethinking the architecture of our Ethereum staking nodes, pooling our beacon nodes, and shifting to a one-to-many configuration, our staking solutions are now more reliable, more efficient, and more secure.

While we don’t expect all staking providers to have the scale or internal resources to successfully move away from the 1:1:1 configuration, we’re proud to pioneer this new architecture, evaluate its performance, and share the results with the industry at large.

Beyond the benefits to Coinbase and its staking customers, these innovations, more importantly, contribute to Ethereum’s increased security, liveness, and growth.

This document and the information contained herein is not a recommendation or endorsement of any digital asset, protocol, network, or project. However, Coinbase may have, or may in the future have, a significant financial interest in, and may receive compensation for services related to one or more of the digital assets, protocols, networks, entities, projects, and/or ventures discussed herein. The risk of loss in cryptocurrency, including staking, can be substantial and nothing herein is intended to be a guarantee against the possibility of loss. Reward rates listed herein are estimates, are not guaranteed and are set by the protocol and remain subject to change. Actual rate of rewards earned may vary significantly and may be zero. This document and the content contained herein are based on information which is believed to be reliable and has been obtained from sources believed to be reliable, but Coinbase makes no representation or warranty, express, or implied, as to the fairness, accuracy, adequacy, reasonableness, or completeness of such information, and, without limiting the foregoing or anything else in this disclaimer, all information provided herein is subject to modification by the underlying protocol network. Any use of Coinbase’s services may be contingent on completion of Coinbase’s onboarding process and is Coinbase’s sole discretion, including entrance into applicable legal documentation and will be, at all times, subject to and governed by Coinbase’s policies, including without limitation, its terms of service and privacy policy, as may be amended from time to time.

Reimagining Ethereum staking node architecture to improve performance and reliability

Setting the stage: The anatomy of an Ethereum staking node

Scale advantage: Overhauling staking node architecture for peak performance

Transitioning to a one-to-many configuration

Assessing the risks of running shared beacon nodes

Tactical considerations to implementation

Redundancy

Observability

Multiple configurations

Locality

Securing our future

Company

Learn

Individuals

Businesses

Developers

Support

Download the App