• Decentralized Ledger:

A blockchain is a distributed ledger maintained by multiple servers. These servers share and manage the same transaction history through consensus algorithms. The fact that everyone possesses the same ledger gives rise to blockchain's various advantages, making the distributed ledger the core of blockchain technology. However, managing a distributed ledger is far more challenging compared to a centralized server. Although the size of distributed ledgers may increase rapidly, it doesn't pose a significant problem in the early stages of a network. However, if this situation persists, when the distributed ledger size becomes extremely large in the future, the entry barrier for new nodes joining the network will become very high, potentially leading to increased transaction fees for future users. Thus, the incentive systems of most existing blockchains focus solely on computation rather than storage.

Sui’s Unique Onchain Data Storage:

Sui has strived to solve storage issues through incentives from its inception. Sui aims to achieve sustainable data storage through an economic mechanism called the Storage Fund, which operates as follows: The fees users submit to Sui validators are divided into

1) gas fees related to computation and

2) storage costs for data retention.

Sui collects storage costs upfront from users for permanent data storage and pools this into the Storage Fund. The Storage Fund continuously redistributes the accumulated amount to validators while the data remains stored on-chain. Moreover, users can receive a refund of the corresponding storage costs if they delete their data.

Sui's unique on-chain data storage system produces two effects:

✓ Since users can receive a storage cost rebate when deleting on-chain data, it can incentivize reducing the distributed ledger's capacity through economic incentives.

✓ The unique incentive structure of collecting storage fees upfront and allocating them as incentives for future validators can address sustainability issues related to storage.

• What About... Blob?

While Sui's Storage Fund is a powerful tool, it's not an efficient method for all types of data. This is because on-chain data includes both small-volume essential data for state calculations, such as transaction data, and enormous data like blobs that include media files. Walrus is a new storage protocol to deal with blob.

• Full replication; IPFS, Arweave:

The most intuitive and simple method to store blobs in a distributed environment is for nodes to replicate and store the entire B uploaded by W. For example, if you upload an image file, instead of nodes splitting and dividing the image file, they replicate and store the image file as is. For examples, IPFS and Arweave. Of course, this doesn't mean that storage nodes participating in IPFS and Arweave download all files stored across the entire network; rather, they store some of the files in their complete form.

W propagates B and the corresponding commitment (binding commitment, e.g., hash) to the nodes and waits until receiving f+1 signatures.

(Assumption: n = 3f+1; At least f+1 signatures are required to ensure that at least one honest node, excluding f Byzantine nodes, has stored B.)

Once f+1 signatures arrive, W can generate an availability certificate, guaranteeing that anyone can fully access the file on that network. However, full replication has many drawbacks, primarily inefficiency:

✓ Write: When W propagates files to nodes to store data on the network, it has a complexity of O(nB).

✓ Read: In an asynchronous environment, reading data incurs a cost of O(nB) because requests must be sent to f+1 nodes.

✓ Recovery: To recover data in an asynchronous environment, a storage node must send requests to f+1 other nodes, costing O(nB) for each storage node. When applied to the entire network, this results in a cost of O(n^2 B). Note that this is the worst cast cost.

• Encode and Share: Storj, Sia

As I mentioned above, the full replication method had advantages of being simple, intuitive, and easy to access complete files, but its drawback was the high storage cost for the entire network. To address this, decentralized storage networks using Reed Solomon (RS) encoding emerged. Notable examples are Storj and Sia. RS encoding is a prominent method for implementing erasure coding, widely used in real life for CDs, DVDs, QR codes, etc. Erasure coding divides original data into multiple pieces and generates additional recovery data from these pieces. Even if some data is lost, the original data can be restored to a certain extent using the recovery data. Let's look at an example of the Encode and Share method. W divides B into f+1 data chunks and additionally encodes 2f extra repair chunks. Thanks to RS encoding's characteristics, the original B can be recovered with just f+1 chunks out of a total of 3f+1 chunks. Here, each chunk's size is O(B/n). W uses binding commitments like Merkle trees to commit to all chunks and sends each node a chunk along with a proof of inclusion showing that the chunk is included in the Merkle tree. Storage nodes receiving chunks verify their validity using the commitment and send a signature back to W. W generates an availability certificate when it receives at least 2f+1 signatures from nodes (as B can be recovered via f+1 even if f out of 2f+1 are Byzantine nodes). The Encode and Share method is more efficient in data writing and reading processes compared to Full replication, but still has the drawback of inefficient data recovery.

Write: It costs a total of O(B) as W propagates chunks of size O(B/n) to the network for data storage.

Read: R has a complexity of O(B/n) * O(n) = O(B) as it must request chunks from nodes and receive f+1 valid responses.

Recovery: In an asynchronous environment, to recover data, each storage node incurs a cost of O(B) as it needs to receive chunks from 2f+1 nodes. Applied to the entire network, this results in a total cost of O(nB), which is still inefficient.

Increasing the efficiency of the data recovery process is crucial because, being a decentralized protocol, nodes can freely join and leave the network. In cases where the node set changes with each epoch, new nodes need to receive and recover chunks from previous nodes. Recognizing this issue, Walrus utilizes a new encoding method.

• Red Stuff: Walrus

Walrus can achieve a recovery cost of O(B/n) for each node and a total recovery cost of O(B) for the entire network through Red Stuff encoding.

Hence, Walrus is the next gen storage protocol.

@Walrus 🦭/acc #Walrus $WAL

WALSui
WAL
0.1414
-8.71%