Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STORAGE USE REDUCTION] Snapshot page compression #1594

Open
cloutiertyler opened this issue Aug 15, 2024 · 5 comments
Open

[STORAGE USE REDUCTION] Snapshot page compression #1594

cloutiertyler opened this issue Aug 15, 2024 · 5 comments
Assignees

Comments

@cloutiertyler
Copy link
Contributor

No description provided.

@cloutiertyler
Copy link
Contributor Author

Need to ask Phoebe if we have sufficient metadata to determine if these are compressed or not.

@gefjon
Copy link
Contributor

gefjon commented Nov 20, 2024

Filenames of snapshot pages are semantically important. It would be easy to recognize {HASH} as uncompressed and {HASH}.zip as compressed.

@gefjon
Copy link
Contributor

gefjon commented Nov 20, 2024

MVP / definition of done as I see it:

  • When taking a snapshot, unconditionally compress all pages and blobs before writing them to disk.
  • When restoring a snapshot, decompress the pages and blobs while reading them into memory.
  • Benchmark to find out how much slower this is than the uncompressed version, and determine whether that slowdown is acceptable.

Things we could do if the slowdown from the above solution is too high:

  • After taking a snapshot, compress all previous snapshots. Whether each snapshot gets its own archive, or the snapshots get added into a single big archive via something like zip -u, requires experimentation.
  • After taking a snapshot, go into the previous "parent" snapshot and compress any of its pages which are not also present in the new "child" snapshot.
    • Additional complexity: when compressing such a page, you have to examine the chain of "ancestor" snapshots before that one, which may also contain the same page, and fix up so that they contain hardlinks to the same compressed archive. Otherwise you don't save any disk space, as the "grandparent" may still contain an uncompressed version of the page that you compressed within the "parent."

@bfops
Copy link
Collaborator

bfops commented Nov 20, 2024

Leaving as a P1 for devops's sake, because disks keep filling up

@bfops
Copy link
Collaborator

bfops commented Nov 20, 2024

the naive unconditional compression thing, would not be backwards-compatible.. but we can do this backwards-compatibly (e.g. by adding .zip to the filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants