Pruning

It is possible to configure a VIA Node to periodically prune all data from L1 batches older than a configurable threshold. Data is pruned both from Postgres and from tree (RocksDB). Pruning happens continuously (i.e., does not require stopping the node) in the background during normal node operation. It is designed to not significantly impact node performance.

Types of pruned data in Postgres include:

  • Block and L1 batch headers

  • Transactions

  • EVM logs aka events

  • Overwritten storage logs

  • Transaction traces

Pruned data is no longer available via Web3 API of the node. The relevant Web3 methods, such as eth_getBlockByNumber, will return an error mentioning the first retained block or L1 batch if queried pruned data.

Interaction with snapshot recovery

Pruning and snapshot recovery are independent features. Pruning works both for archival nodes restored from a Postgres dump, and nodes recovered from a snapshot. Conversely, a node recovered from a snapshot may have pruning disabled; this would mean that it retains all data starting from the snapshot indefinitely (but not earlier data, see snapshot recovery limitations).

A rough guide whether to choose the recovery option and/or pruning is as follows:

  • If you need a node with data retention period of up to a few days, set up a node from a snapshot with pruning enabled and wait for it to have enough data.

  • If you need a node with the entire rollup history, using a Postgres dump is the only option, and pruning should be disabled.

  • If you need a node with significant data retention (order of months), the best option right now is using a Postgres dump. You may enable pruning for such a node, but beware that full pruning may take significant amount of time (order of weeks or months). In the future, we intend to offer pre-pruned Postgres dumps with a few months of data.

Configuration

You can enable pruning by setting the environment variable

By default, the node will keep L1 batch data for 7 days determined by the batch timestamp (always equal to the timestamp of the first block in the batch). You can configure the retention period using:

The retention period can be set to any value, but for mainnet values under 21h will be ignored because a batch can only be pruned after it has been executed on Bitcoin.

Pruning can be disabled or enabled and the data retention period can be freely changed during the node lifetime.

Storage requirements for pruned nodes

The storage requirements depend on how long you configure to retain the data, but are roughly:

  • 20GB + ~80MB/day of retained data of disk space needed on machine that runs the node

  • 100GB + ~1MB/day of retained data of disk space for Postgres

Note: When pruning an existing archival node, Postgres will be unable to reclaim disk space automatically. To reclaim disk space, you need to manually run VACUUM FULL, which requires an ACCESS EXCLUSIVE lock. You can read more about it in Postgres docsarrow-up-right.

Monitoring pruning

Pruning information is logged with the following targets:

  • Postgres pruning: zksync_node_db_pruner

  • Merkle tree pruning: zksync_metadata_calculator::pruning, zksync_merkle_tree::pruning.

To check whether Postgres pruning works as intended, you should look for logs like this:

(Obviously, timestamps and numbers in the logs will differ.)

Pruning logic also exports some metrics, the main of which are as follows:

Metric name
Type
Labels
Description

db_pruner_not_pruned_l1_batches_count

Gauge

-

Number of retained L1 batches

db_pruner_pruning_chunk_duration_seconds

Histogram

prune_type

Latency of a single pruning iteration

merkle_tree_pruning_deleted_stale_key_versions

Gauge

bound

Versions (= L1 batches) pruned from the Merkle tree

Last updated