Sparse File LRU Cache

ternarysearch.blogspot.com

35 points by paladin314159 11 hours ago

uroni an hour ago

I’ve used this technique in the past, and the problem is that the way some file systems perform the file‑offset‑to‑disk‑location mapping is not scalable. It might always be fine with 512 MB files, but I worked with large files and millions of extents, and it ran into issues, including out‑of‑memory errors on Linux with XFS.

The XFS issue has since been fixed (though you often have no control over which Linux version your program runs on), but in general I’d say it’s better to do such mapping in user space. In this case, there is a RocksDB present anyway, so this would come at no performance cost.

avmich 6 hours ago

We can talk about even more general idea of saving file space: compression. Ever heard about it used across the whole filesystems?

praseodym 2 hours ago

Microsoft MS-DOS and Windows supported this in the 90s with DriveSpace, and modern file systems like btrfs and zfs also support transparent compression.
fh973 3 hours ago

Most compressible file formats are already compressed, and with compression you lose efficient non-sequential IO.
eeeficus 4 hours ago

You introduce overhead on both read and write without being a better solution to OPs problem.

hahahahhaah 40 minutes ago

I am guessing the choice here is do you want the kernel to handle this and is that more performant than just managing a bunch of regular empty files and a home grown file allocation table.

Or even just bunch of little files representing segments of larger files.