Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle deletion failure in LRUCache #11878

Open
3 tasks
GregoryTravis opened this issue Dec 16, 2024 · 3 comments
Open
3 tasks

Handle deletion failure in LRUCache #11878

GregoryTravis opened this issue Dec 16, 2024 · 3 comments
Assignees
Labels
-libs Libraries: New libraries to be implemented p-low Low priority

Comments

@GregoryTravis
Copy link
Contributor

An input stream opened to a cache file can outlive the cache file itself, in which case the current cache code will delete it (or attempt to) before the stream is closed. Under windows, this fails immediately; on max/linux, it succeeds. In either case, the file continues to exist for some time but is not counted against the cache total.

In general, even without explicit cache clearing, any cache file input stream might still be in used when the cache is asked to store more files, which can result in a cleanup, so this needs to be handled in a general fashion

We cannot track the lifetime of open input streams, so we have to handle this another way:

  • Fix tests that leave the stream unconsumed -- this is the cause of the immediate failure
  • Under windows, if we fail to delete, we keep track of the file as obsolete, count it against the total disk space, and attempt to delete it later until it succeeds
  • Under macos/linux, possibly keep track an unlinked but undeleted inode, and count it against the total disk space
@GregoryTravis
Copy link
Contributor Author

As @radeusgd points out, the Data.fetch interface is meant to be the only one used by end-users, and the Data methods always close streams.

This suggests that should consider still-open streams to be either (1) a library bug, or (2) a race condition caused by cache pressure during a long fetch. We don't have to concern ourselves with possibility that an end user will try to open a stream and keep it open for a long period.

Solving the race condition requires checking for successful file deletion, and retrying, but we can assume that undeleted files will only last a short time.

@radeusgd
Copy link
Member

Indeed, using HTTP.fetch without consuming the output is a bug. I wonder what we could do to detect these.

Perhaps we could add a piece of code to the finalizer that logs a warning when a stream is closed by GC reclaiming its associated Managed_Resource (but such warning should only apply to the HTTP.fetch streams, other streams like the materialized one are supposed to be handled by GC like that). When the stream is manually closed (i.e. by being consumed - for example like in Data.fetch by being materialized) this would not be logged. We could then inspect if our tests end up logging this.

What do you think?

@GregoryTravis
Copy link
Contributor Author

Yes, I think that would work.

To protect against unexpected unclosed streams, it might be possible to add a wrapper to detect stream closure, and defer cleanup of obsolete cache files until the streams are closed. This would require extensive testing, though, so it's not a quick fix.

But for detecting unclosed streams in current library code (and end-user code that goes throught Data.fetch) this logging would be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented p-low Low priority
Projects
Status: New
Development

No branches or pull requests

2 participants