Data Availability FAQ on Celestia Network
What is data availability?
Data availability is the ability to access and retrieve data from a database or system. In the context of blockchain technology, data availability refers to the ability of nodes (computers on the network) to verify the availability of data within a block on the blockchain. When a new block is added to the chain, nodes will attempt to download and verify all the transaction data within that block to ensure that it has been published and is available for inspection. Data availability is important for the security and trustworthiness of a blockchain, as it allows anyone to verify the accuracy and integrity of the ledger of transactions. However, as blockchains scale and the size of blocks increases, it can become difficult for users to download and verify all the data, which can affect data availability.
What is the data availability problem?
The data availability problem refers to a situation where the transaction data for a new block on a blockchain cannot be downloaded and verified by nodes on the network. This can occur due to an attack known as a data withholding attack, in which the block producer withholds the transaction data for the new block. This can have serious consequences for the blockchain, such as halting its operations or allowing funds to be stolen. The data availability problem is more likely to occur in layer 2 scaling solutions like rollups and validiums. It is a significant issue that must be addressed in order to ensure the security and trustworthiness of a blockchain.
How do nodes verify data availability in Celestia?
In Celestia, a modular blockchain, nodes can verify data availability using a process called data availability sampling. This allows light nodes (computers on the network that do not store the entire blockchain) to verify the availability of data within a block without needing to download all the transaction data. This allows them to efficiently verify data availability while still maintaining the security and trustworthiness of the blockchain. Data availability is critical to the security of any blockchain, as it ensures that anyone can inspect the ledger of transactions and verify its accuracy and integrity. By using data availability sampling, Celestia is able to improve the scalability and performance of the blockchain while still maintaining data availability.
What is data availability sampling?
Data availability sampling (DAS) is a method used by light nodes (computers on a blockchain network that do not store the entire blockchain) to verify the availability of data within a block without needing to download all the data. It works by having the light node conduct multiple rounds of random sampling for small portions of the block data. As the light node completes more rounds of sampling, it increases its confidence that the data is available. Once it reaches a predetermined confidence level, it will consider the data as available. DAS is used to improve the scalability and performance of a blockchain by allowing light nodes to verify data availability more efficiently, without sacrificing security and trustworthiness.
What are some of the security assumptions that Celestia makes for data availability sampling?
In Celestia, data availability sampling (DAS) is used by light nodes to verify the availability of data within a block without needing to download all the data. To ensure the security of DAS, Celestia makes several assumptions:
- A minimum number of light nodes are conducting DAS for a given block size: This is necessary so that a full node (a computer on the network that stores the entire blockchain) can reconstruct the entire block from the data sampled and stored by light nodes. The number of light nodes needed will depend on the size of the block.
- Light nodes are connected to at least one honest full node: This ensures that light nodes can receive fraud proofs for incorrectly erasure coded blocks. If a light node is not connected to an honest full node, it cannot verify that the block has been improperly constructed.
Overall, these assumptions are necessary to maintain the security and trustworthiness of the blockchain while still allowing for more efficient data availability verification through DAS.
Why is block reconstruction necessary for security?
In Celestia, block reconstruction is necessary for the security of the blockchain because it allows full nodes (computers on the network that store the entire blockchain) to verify the accuracy and integrity of the data within a block. Blocks in Celestia are erasure coded, which means they contain redundant data to aid the data availability sampling process. However, there is a risk that the data could be encoded incorrectly. In order to detect and prevent this, Celestia uses fraud proofs to verify that the erasure coding is incorrect. In order to generate a fraud proof, the full block data is needed.
If full nodes are not able to reconstruct the full block from the data stored by light nodes (computers on the network that do not store the entire blockchain), they would not be able to generate a fraud proof. This could compromise the security of the blockchain by allowing invalid data to go undetected. Therefore, block reconstruction is an important security measure that helps ensure the trustworthiness of the Celestia blockchain.
What is data storage?
Data storage refers to the ability to save and access data in a database or system. In the context of blockchain technology, data storage refers to the ability to store and retrieve past transaction data. This data is used for a variety of purposes, including reading the information of previous transactions, syncing nodes on the network, indexing and serving transaction data, and retrieving information about non-fungible tokens (NFTs). Data storage is an important aspect of blockchain technology, as it allows users to access and verify the accuracy and integrity of the transaction history on the ledger.
What is the problem around data storage?
The problem with data storage in a blockchain is the ability to store and successfully retrieve past transaction data at a later time. If historical transaction data is not accessible, it can cause problems such as users being unable to access information about their past transactions or nodes being unable to sync from the beginning of the blockchain. However, the security assumptions around data storage in a blockchain are relatively weak. As long as there is at least one copy of the blockchain’s history that is accessible, users should be able to access historical transaction data. This means that data storage security relies on the assumption that at least one node on the network is honest, which is known as a 1 of N honesty assumption.
What is the difference between data availability and data storage?
Data availability refers to the ability to verify that the transaction data for a new block on a blockchain is publicly available and can be accessed by nodes on the network. Data storage, on the other hand, involves saving and accessing past transaction data from old blocks. These two concepts are related, but they have some important differences:
- Data availability is focused on verifying the availability of new data, while data storage involves accessing historical data.
- Data availability requires an honest majority assumption, meaning that a majority of the nodes on the network must be honest in order for data availability to be verified. Data storage, on the other hand, only requires a 1 of N assumption, meaning that at least one node on the network must be honest.
- Data availability is critical to the security of a blockchain, as it ensures that anyone can verify the accuracy and integrity of the ledger of transactions. Data storage is less critical to security, although it is still important for allowing users to access and verify historical transaction data.
Overall, it is important to solve the data availability problem in order to maintain the security and trustworthiness of a blockchain, while the issue of data storage is less critical but still important for the functionality and usability of the blockchain.
Where does blockchain state fit into this?
Blockchain state refers to the current snapshot of the network, which includes information about account balances, smart contract balances, and validator set information. It is distinct from transaction data, which refers to the record of transactions that have occurred on the blockchain. The size of the state can be a concern because it can affect the scalability and performance of the blockchain. The issues related to the size of the state are different in nature than those related to data availability and data storage. Data availability refers to the ability to verify that new transaction data is publicly available and can be accessed by nodes on the network, while data storage involves saving and accessing past transaction data. The size of the state, on the other hand, refers to the amount of data that is included in the current snapshot of the network and can affect the performance and scalability of the blockchain.
Why doesn’t Celestia incentivize storage of historical data?
Celestia, like many other blockchains, does not incentivize the storage of historical data because it is not the responsibility of the blockchain to guarantee that past data will always be retrievable. Data storage is a relatively weak problem that only requires a single party to store and provide the data for users. Celestia’s primary focus is on providing a secure and scalable way to verify the availability of data. Once data has been verified as available, it is up to other entities to store and retrieve the data as needed. There are natural incentives for outside parties to store and serve historical data to users, such as the desire to provide a valuable service or the opportunity to earn revenue through data storage and retrieval.
Who may store historical data if there is no reward?
There are several types of actors that may be interested in storing historical data on a blockchain, even if there is no reward specifically for doing so. These actors may include:
- Block explorers: These are websites or tools that provide users with access to past transaction data on the blockchain.
- Indexers: These are entities that provide API queries for past data on the blockchain, allowing users to search and access historical data.
- Applications or rollups: These are software programs or protocols that may require historical data for certain processes or operations.
- Users: Individual users may want to store their own transaction history in order to have access to it in the future.
Overall, there are a variety of actors that may be motivated to store historical data on a blockchain, either for the purpose of providing a valuable service or for their own personal use.
What are some things blockchains can do to provide stronger assurances of data retrievability?
There are several things that blockchains can do to provide stronger assurances of data retrievability:
- Reward nodes based on the amount of transaction data they store and requests for data they serve: Some blockchains, such as Filecoin, incentivize nodes to store and serve data by rewarding them based on the amount of data they store and the number of requests they serve.
- Publish transaction data onto a data storage blockchain: Another option is to publish transaction data onto a separate data storage blockchain that specifically incentivizes the storage and retrieval of historical data. This could provide stronger assurances that the data will be preserved and made available to users over time.
Overall, these options can help to ensure that historical data is stored and made available to users, which can enhance the security and trustworthiness of the blockchain.
Source: https://docs.celestia.org/concepts/data-availability-faq/