Review on Decentralized Storage Services

Published by Nomana Majeed on

Data stored at a centralized location such as data centre are vulnerable to single point of failure. It requires user to trust the storage provider that his data would not be tamper. Such systems are also exposed to Denial of service attack, filtration attack, data breaches, etc.

Moving on to the decentralization, blockchain is considered as one of the solutions to store the data on-chain. However, storage of data on the blockchain can be very expensive due to very large size of files that every participating node need to store such as NFT.

Decentralized file storage has not only solved the issue of single point of failure by replicating the files on multiple storage nodes off-chain but has also solve the storage cost issue by only storing the hashes and proof of these files on the blockchain. Decentralized storage is more resistant to censorship, more robust, and provide greater redundancy. Also, it does not require the encrypted data to be store on all the participating nodes (individuals or organizations) which is the case with Ethereum blockchain. In almost all decentralized storages, files are encrypted, sharded, and distributed among multiple storage provides.

Centralized StorageDecentralized Storage
Single point of failureTrust is distributed among multiple nodes
Censorship attackCensorship resistant
Low redundancyHigh redundancy
Price is decided by centralized storage providerPrice is decided by market and democratic protocol
Comparison of Centralized and decentralized storage

Crucial Properties of Decentralized Storage Services:

  1. Decentralization: the storage network should be fully decentralized with no single point of failure.
  2. Content immutability: the content should not be manipulated or deleted once it is stored. It can be achieved by storing cryptographic hashes of the file on the blockchain. The benefit of content addressing is data integrity, verification and authenticity.
  3. Usability: system should be user friendly that deliver all the benefits to the user without going into the complexity that put the user in trouble. Sometime, setting a wallet can be very challenging for a beginner.
  4. Scalability: requests must be sent to the nearest node in order to avoid bandwidth bottleneck. 
  5. Equality: All participating nodes should have equal rights.
  6. Monetization: should use cryptocurrencies to provide monetary incentives.  Many existing DFS are using cryptocurrencies for this purpose.
  7. Unlimited resources: there should be enough number of participating nodes who provide their storage for the rent.
  8. Censorship resistance: data availability and network performance depend on multiple nodes hence lead to censorship resistance.
  9. Attack resilience: A high number of participating nodes help to avoid attacks such as DoS and provides a robust platform.

Decentralized storages:

  1. FileCoin:

FileCoin was created by Protocol Labs as a complementary protocol for IPFS.  Since, IPFS does not provide guaranteed participation of nodes therefore, FileCoin was introduced to solve this issue by incentivising the participating nodes using “filecoin” cryptocurrency token in 2017 as its incentive method in order to maintain the data and nodes availability. IPFS address and move content while FileCoin creates incentives to persist data. It uses both erasure encoding and replication method as a data storage format. Proof-of-Replication based on zk-snarks is used to proof correct data storage and proof-of-Spacetime is used on daily basis to prove that the storage provider is not manipulating or corrupting the original data over time. FileCoin is one of the prominent storage providers in the market in term of usage and capacity with a market cap of ~ $3.799M and 1 GB of storage cost only 0.0000009$ per month.

FileCoin uses blockchain for tracking the storage nodes such as proofs therefore anyone can verify. Moreover, miners have to pay for the pledge before serving as a storage provider. If he provides a fake proof or no proof at all, then he has to lose his money. The data in fileCoin is end-to-end encrypted but it does not provide data confidentiality. The whole process involves multiple proofs based on zk-snarks with is expensive to generate and involves trust on multiple parties. Also, there is no entry criteria for the user and providers therefore any can join without any conditions.

  1. Storj:

Storj, a contract based decentralized storage is the second leading competitor of FileCoin in term of storage usage. Storj uses erasure coding by breaking files into 256 MB encrypted pieces and requires only 29-out-of-80 pieces to retrieve the file. It uses Storj cryptocurrency to incentivise its participating nodes. It uses AES-256-GNM symmetric encryption and cryptographic hash function to encrypt the file to provide data confidentiality.

Since Storj uses blockchain to save wallet addresses (token) only therefore entire communication is done by six satellites. These satellites act like a coordinator between the user and the storage providers to manage where the data is stored, how to retrieve the data, and other management tasks. Data availability is checked on regular interval by selecting random fragments of the file and the report is submitted to the satellites, hence Storj sacrifices some decentralization.

  1. Sia:

Sia is a contract-based decentralized storage platform that uses smart contract to create a contract agreement between the user and storage provider as to where the data will be stored at what price. Sia stores the hash of the root of merkle tree on the blockchain that helps to provide data integrity.  Smart contract automatically makes the payment to the provider once the proof is verified. It uses erasure coding and only 10-out-of-100 pieces are required to retrieve the data that raise question on security. Sia is the best choice if privacy is important as the storing nodes does not know which data fragments they are storing and whom those fragments belongs to. However, the user must login on daily basis to check if any fragment of his data needs to be retrieved. It uses Threefish algorithm to encrypt the file. Also, the storage providers have an option to reject the illegal files but they are not protected from denial of service attack so it can stop the provider from submitting the proofs and transferring the files. Sia has a network capacity of 7.77 PB, stores over 2.62 PB of data, and has over 730 active nodes distributed around the world.

  1. SWARM:

Swarm is a contract-based decentralized storage runs over Ethereum that extend blockchain with P2P storage to develop an environment for decentralized storage and communication purposes. Swarm uses erasure encoding, the address of the data is directly embedded within the data chunk that are directly stored with the Merkle tree, while root hash of the tree is the proof that the file was chunked correctly. However, the entry of nodes is permissionless. Also, it uses Kademlia distributed hash table that requires multiple network round trips for many operations, which make it difficult to achieve millisecond-level response time.

Cryptographic primitiveNative BlockchainToken usedSharding methodAccess controlData confidentialityData IntegrityUse of blockchain
FileCoinzk-snarks (proof of replication, spacetime) (A better primitive is needed that does not require snarks. Better trusted setup required for snarks.)FileCoinFileCoin (FIL)Erasure code and replication method (user can set replication factor)Anyone can join the systemDoes not provide any method to encrypt the data but you can use filecoin to store your encrypted dataCryptographic hashFor storage of proofs and payments
StorjAES-256-GNM symmetric encryptionEthereumStorj (STORJ), ERC20- EthereumErasure code (29-out-of-80) piecesNo protection against compromised clientsThrough encryptionCryptographic hashOnly store wallet address for Storj token
SiaMerkle tree, Threefish encryption algorithmSiaSiaCoin (SC)Erasure code (10-out-of-80)Anyone can join the systemYes You never know if you are storing a legal or illegal data.Hash of merkle treeMerkle tree root hash on blockchain, track storage orders
SWARMMerkle tree, Bee client for encryptionEthereumBZZ tokenErasure code  Anyone can join the systemThrough encryptionCryptographic signatureincentive system, enforced through smart contracts on the Ethereum blockchain
Comparison between existing Decentralized solutions

Conclusion:

Selection of decentralized storage depends on the requirements of the users. These projects claimed to provide data availability, content immutability, decentralization, scalability, access control, and attack resistance, but not all of them achieve all the characteristics leading to privacy, security, and data safety. Storj is not designed to protect against compromised clients, and the system uses only six satellites for communication, which leads to centralization. Sia and Filecoin do not provide data confidentiality, and the low thresholds for retrieving data also raise questions about data security. The use of insecure cryptographic primitives is another privacy issue with many existing decentralized storage systems. Some of the open questions concerning decentralized storage are:

  1. Should we use erasure codes or replication method?
  2. What should be the threshold level to retrieve the data? What is a good “number” of storage providers?
  3. Will traditional encryption scheme will remain secure in the near future? Stepping into post-quantum era.
  4. How to ensure that data can be retrieve any time in the future while having corrupted storage providers?

References:

  1. Bauer, D. P. (2022). Filecoin. In Getting Started with Ethereum: A Step-by-Step Guide to Becoming a Blockchain Developer (pp. 97-101). Berkeley, CA: Apress.
  2. Storj Labs, I. (2018). Storj: A Decentralized Cloud Storage Network Framework.
  3. Vorick, D., & Champine, L. (2014). Sia: Simple decentralized storage. Retrieved May8, 2018.
  4. Team, S. (2021). SWARM-storage and communication infrastructure for a self-sovereign digital society.
Categories: Blog