I’m not the only one with the idea of using desktop machines en-masse to provide distributed file storage.
Microsoft Research have a project called Farsite which has many
They’ve also got a bunch of cartoons which I’ll uhh… leave to stand on their own.
The key differences though are:
They don’t keep multiple copies of identical files, even if stored by different users.
This is… interesting, but has privacy implications (i.e: Anyone can see if someone else has a specific file, simply by querying the network for that hash). It definitely is more space efficient.
My idea uses a bit-torrent like approach to store data whereas theirs appears to store individual files everywhere
I think my approach would be slightly better for ease of distribution.
eg: putting a 500MB file on the virtual drive might cause issues, whereas splitting it up into 2MB pieces makes it much easier to digest by individual nodes.
They have a totally serverless environment, where you negotiate contracts [with other peers] individually (or the program does this on your behalf)
Anyone with the right client can participate in their network - if you want to have 500MB of storage on the network, you need to provide 500MB for other people to use.
This leaves itself open to abuse by someone who says they’re offering 10GB, but really offers only a fraction of that. Additional verification steps should help mitigate this, but may not be able to eliminate it entirely.
My approach requires that all users set aside a minimum amount of space - and accept transfer requests from others, or as specified by the server. Refusal to accept the data should result in locking the node. There’s no need to negotiate with others, since the network provides everyone with a certain amount of space.
Their system doesn’t have any backup capability
They explain this by saying the system is very well distributed, so should have little or no chance of a complete failure (Which is true).
My approach would use checkpoint (i.e diffs) type storage - so the current file is kept as a contiguous file in your cache, but diffs are kept for each save. This lets you do a ‘roll back’ to a previous version (eg: accidentally saving changes to a photo, for example - could be easily rolled back)
So:
Base (Original File) + Diff_1 + Diff_2 + Diff_3 + …. + Diff_n — allowing you to access any of those previous versions (including if the file was accidentally deleted)
When your storage quota is starting to run out, files with the oldest diffs are “rolled up” a certain amount.
So:
Take: Base,
Apply Diff_1 + Diff_2 + Diff_3 to Base , and save as New Base
Remove Original Base.
All subsequent Diffs are still valid for the new Base (since they’re diffs between Current Version+New Version).
