I’ve used this technique in the past, and the problem is that the way some file systems perform the file‑offset‑to‑disk‑location mapping is not scalable. It might always be fine with 512 MB files, but I worked with large files and millions of extents, and it ran into issues, including out‑of‑memory errors on Linux with XFS.
The XFS issue has since been fixed (though you often have no control over which Linux version your program runs on), but in general I’d say it’s better to do such mapping in user space. In this case, there is a RocksDB present anyway, so this would come at no performance cost.
Microsoft MS-DOS and Windows supported this in the 90s with DriveSpace, and modern file systems like btrfs and zfs also support transparent compression.
I am guessing the choice here is do you want the kernel to handle this and is that more performant than just managing a bunch of regular empty files and a home grown file allocation table.
Or even just bunch of little files representing segments of larger files.
I’ve used this technique in the past, and the problem is that the way some file systems perform the file‑offset‑to‑disk‑location mapping is not scalable. It might always be fine with 512 MB files, but I worked with large files and millions of extents, and it ran into issues, including out‑of‑memory errors on Linux with XFS.
The XFS issue has since been fixed (though you often have no control over which Linux version your program runs on), but in general I’d say it’s better to do such mapping in user space. In this case, there is a RocksDB present anyway, so this would come at no performance cost.
We can talk about even more general idea of saving file space: compression. Ever heard about it used across the whole filesystems?
Microsoft MS-DOS and Windows supported this in the 90s with DriveSpace, and modern file systems like btrfs and zfs also support transparent compression.
Most compressible file formats are already compressed, and with compression you lose efficient non-sequential IO.
You introduce overhead on both read and write without being a better solution to OPs problem.
I am guessing the choice here is do you want the kernel to handle this and is that more performant than just managing a bunch of regular empty files and a home grown file allocation table.
Or even just bunch of little files representing segments of larger files.