Why Does Random Data on Filecoin Matter?
By Jinbin Xie
What is rational is actual, and what is actual is rational. — G. W. F. Hegel
Background
Many current and prospective participants of the Filecoin ecosystem observe that as they mine Filecoin on a daily basis through Committed Capacity (CC), those files are generated from random data. Due to some historical reasons, committing storage onto Filecoin generates random data files by default which led to the subjective misconception that Filecoin is a public chain full of useless and random data.
Opinion
Users can seal any data that they find useful in the CC type sector of Filecoin, except that it defaults to generating random data on the software level. In addition, quality adjusted power of CC is 10x less than deals from Filecoin Plus. Storing useful data requires additional time, negotiation, and business costs. Before Filecoin Plus became operational (where storing Filecoin Plus clients earns miners a 10x multiple in storage power), it might have been rational for some miners to seal sectors with randomly generated data, instead of searching for useful data. However, sealing generated data can still be meaningful in the following ways;
1. Establish network consensus security quickly.
2. Expand the scale of storage quickly and become a leader in distributed cloud storage.
3. Prove storage reliability for a brand-new storage network.
With the rise of developer tools like Powergate and the roll out of Filecoin Plus, more and more applications and use cases will emerge on the Filecoin network, uploading/downloading files in a decentralized way on the network.
Analysis
To begin with, we need to understand the principles of Filecoin’s core proof system. According to the Proof-of-Space-Time (PoSt) in the Filecoin protocol, files are sealed in sectors, and after complex calculation, the zkSnark algorithm will generate a succinct proof then submit it on blockchain. Miner’s storage power is equal to the quality multiplier multiplied by sector size. The larger value of storage power, the higher the likelihood of winning block rewards. Therefore, Filecoin’s consensus security is based on the proof generated by stored files. But where does the storage come from?
1. Users upload files through APIs and store via Fil-Market.
2. Offline data and upload.
3. Generate random data for sealing. You can also seal files you are interested in into CC-type sectors per https://github.com/filecoin-project/FIPs/issues/57 .
For most successful business models, they usually start from a demand and grow the market from there. The Filecoin network starts from decentralized storage but the demand for decentralized storage is still a hypothesis that needs to be validated. This is the classic Chicken-and-Egg problem for most platform businesses. In addition, the security and consensus of the Filecoin blockchain depend on the amount of storage proven on the network. As such, there is a practical meaning to sealing random data on the Filecoin network, establishing network consensus security while demonstrating the reliability and viability of decentralized storage. As of writing, there is about 2EB of storage Filecoin to provide security guarantee for the network with another 4.2PB of real data from Slingshot.
What if files are all stored by on-chain orders instead of random data sealed in the CC sector?
According to the public data generated from Slingshot, there are 4.2PB of useful data on Filecoin. Slingshot is currently in phase two and will span about 4 months. Currently, the total storage scale of Filecoin’s entire network is 2.5 EiB, which is equivalent to 487 times the scale of useful data.
Now, you may wonder, why is there such a huge gap between the two types of sector content? Let’s use a simple calculation on data transmission. If we need to submit 1TB valid data, the networking condition is 100MB/s, and theoretically it takes t (time)=((1*1024*1024)/100)/ (60*60)=2.9 hours. Thus, the overall time spent on data transmission during Slingshot alone is 12,526 hours, approximately equal to 179 days (about 5 months).
Discrepancies exist between theoretical calculation and real condition. Here are some factors:
1. Support offline orders
2. You can split the data submit orders at the same time
3. Different miners have different methods to speed up the transfer
4. Other optimizing methods
As a result, we can see that the overall storage scale grows slowly when uploading data on-chain.
Influences of small scale of orders on chain:
Filecoin mining machines are oversold. The data of Q4 2020 are as follows:
According to the International Data Corporation (IDC) Worldwide Quarterly Server Tracker, vendor revenue in the worldwide server market grew 2.2% to $22.6 billion on a year-over-year basis, and during the third quarter of 2020 (3Q20). Worldwide server shipments declined 0.2% to nearly 3.1 million units on a year-over-year basis in 3Q20. Revenue of high-end servers was up 5.8% to $19.0 billion, while middle-end server revenue declined 13.9% to $2.6 billion, and high-end servers declined by 12.6% to $937 million.
“Global demand for enterprise servers was a bit muted during the third quarter of 2020 although we did see areas of strong demand,” said Paul Maguranis, senior research analyst, Infrastructure Platforms and Technologies at IDC. From a regional perspective, server revenue within China grew 14.2% on a year-over-year basis. And worldwide revenues for servers running AMD CPUs were up 112.4% while ARM-based servers grew revenues 430.5% on a year-over-year basis, albeit on a very small base of revenue.” On a geographic basis, China was the fastest growing region at 14.2% on a year-over-year basis. The Asia-Pacific region (excluding China and Japan) grew by 3.0%; Asia/Pacific (excluding China and Japan) increased 3.0%, while Latin America and North America grew 1.5% and 1.8% on a year-over-year basis, respectively (Canada at 13.1% and the United States at 1.5%). Both EMEA and Japan declined during the quarter at rates of 4.9% and 21.4%, respectively.
AMD servers in China are mainly used for Filecoin mining, and traditional Internet companies basically choose Intel servers. For just one quarter, the sales volume can reach hundreds of millions.
According to onchain data analysis, the daily sealing data volume of a miner is more than 1PB each day, which means theoretically 4.2PB produced in Slingshot (without considering network transmission) can be completed in 4 days by one single miner.
With oversold sealing capacity, the usage rate of mining machines can be low. However, all of them have to be put into operation since data centers and inventory incur a daily cost.
Many miners have to fulfill promises to their customers, but they also face the lack of useful data for starting their machines. Some of them had no choice but to mine Filecoin’s forks to power on more machines and generate more revenue for their customers.
In the past four months since mainnet launch, the Filecoin network has onboarded about 2.5 EiB of data, positioning Filecoin as the leader in the decentralized storage network market while most forks are in a dysfunctional situation.
Because there are several different use cases on the network, they cannot all rely on the on-chain order mode. Miners get to decide what to put in their sectors. Driven by various factors, when making a deal, miners naturally choose to generate random data files to increase their storage power.
Positive Changes Brought by Filecoin Plus
Among all data sources, many miners naturally choose the most cost effective way to fill their sectors — randomly generated data. If there are no other incentives or programs, the entire network will be dominated by random data.
Filecoin Plus aims to maximize the amount of useful storage by adding a layer of social trust in the Filecoin network.
The storage clients(https://github.com/filecoin-project/filecoin-plus-client-onboarding#client) can ask the notary(https://github.com/filecoin-project/filecoin-plus-client-onboarding#notary ) for DataCap (https://github.com/filecoin-project/filecoin-plus-client-onboarding#datacap ), and DataCap can be used to incentivize miners to accept storage deals. Miners who accept Filecoin Plus order transactions can receive 10 times the share of block rewards. Filecoin Plus truly delegates power to the hands of users and incentivizes miners to support real user cases of Filecoin network.
Filecoin community members use social algorithms and governance processes to improve the utility of the Filecoin network. With incentives for more participants, more and more useful data will be stored on the Filecoin network.
Maturity of Applications
More and more applications are blooming in the Filecoin ecosystem.
Applications and tools such as Textile and Powergate help developers quickly build an SDK for uploading and downloading files on the Filecoin network. Software libraries also help developers use queue tools to bulk upload storage orders.
Additionally, the combination of Livepeer and Powergate has enabled a video content sharing website.
Moreover, there are teams working on fast file retrieving and distribution on a caching layer. Each of these applications are contributing to a more mature, resilient, and vibrant storage network.
Future Outlook
Bitcoin’s PoW consensus incentivizes miners to add more hash power to the network. The greater the hash power, the stronger the security. Similarly for Filecoin’s PoSt and expected consensus, the more storage there is on Filecoin, the stronger the consensus security. This directly affects the choices of Web3 developers. Because the larger the scale, the stronger the network effect, and the greater the robustness. This in turn attracts more applications, the scale continues to expand and the impact continues to increase. This is a positive flywheel cycle.
Filecoin starts with storage, but it is much more than storage. In the future, when a Turing-complete virtual machine exists on the network, data can be composed with logic in a much more powerful way. In today’s Web2.0 world, various information products used crawler technology to crawl data on the public network then combine it. Web3.0 will be a new era.
With more smart contract public chains integrated with Filecoin, and Filecoin can eventually support smart contracts in the future. Developers can combine smart contract logic with file CID, end-to-end encryption technology to build a brand new universe on data. In the information world, data is an incredibly important asset, and seamless transfer of assets is what blockchain excels at.
Nothing is black or white. The real world is nuanced.