-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configurable cache entry expiration #31
Comments
Can you give an example of the extra triple? |
Can you talk more about the motivation for adding this as a triple? I'm concerned about the potential impact here. Is there an assumption that this triple would be included in the serialized representation? |
The way Marmotta handles this is like so: triples from the resource are cached in one location and metadata about those cached triples is stored separately (i.e. when the triples were retrieved). This way, it is possible to configure a TTL globally (or by endpoint) without mixing the metadata with the triples from the resource. Marmotta also does not use RDF to store that metadata, nor is there any inherent need to do so. As an example, Marmotta's file-based cache looks like this:
This way, you don't mix the resource triples and the metadata about those triples; nor do you run into namespace clashes between that metadata and the triples themselves. |
@acoburn Thanks for that info. I am new to Marmotta and linked data fragments, so pointers are appreciated. My only concern is that this gem also provides caching in Blazegraph. My approach would have to be compatible for Marmotta and Blazegraph (and other potential repositories). |
|
Considering the use case here is effectively a reverse proxy cache for an external RDFSource, I'm 👎 on triples attached to the subject URI for configuration of the caching system. I might be able to be convinced that a second URI (or maybe a named graph, but I'm iffy there too) which is included in the response could have that, though. The other thing is there's no way to update triples in this gem now, sort of on purpose. You'd need that functionality to add the expiration triple, yes? The use cases were always simple: cache external responses and provide information about that cache. I think there's a benefit to keeping it that way. Global TTL seems like a good config option. The Marmotta backend's always had it, but surfacing it in this layer is a good thing for those backends that don't have caching built in. |
@anarchivist The motivation is that some authorities change the display string, and potentially other triple values, associated with a controlled vocabulary term. If you capture the triples associated with a URI once and never update, you will be using a stale cache value. Having a configurable TimeToLive value allows you to invalidate subject_URIs in the cache, forcing a refresh from the original source. Having TimeToLive be configurable by host allows for a more flexible approach to cache refresh, so that an authority that rarely modifies its data can have a longer TimeToLive setting than an authority who commonly modifies data. |
I will say, I have a feeling that most users of this don't want hard expirations - they want something like periodic updates from upstream. If the remote source goes down, you want your cache to work even if the TTL is over. |
(They might not even want AUTOMATIC periodic updates - I've heard concerns about data drift in remote sources before, but maybe that's a second product which mints a sameAs URI for temporal locks) |
When a subject expires, the proposal says attempt to get from source and if unsuccessful (i.e. server is down) then use the cache. |
@elrayle Yeah, I think I could agree if the workflow was more like "if TTL was past, queue up a refresh in the background and serve up a response QUICKLY anyways, with a header saying it's stale", then have some method to block the response while waiting for a cache update. |
One piece I left out of the proposal that we were discussing locally is having something like a cron job that crawls the cache at night and attempts a refresh on expired subject_URIs. BTW, I like the idea of using a named graph to hold expiration dates. That avoids potential conflict with the cached data. |
Basically my use cases around TTL are these:
Anything which solves those three use cases I'm 👍 for. |
@elrayle You may want to take a look at the Marmotta LDCache interface for inspiration. In particular, the |
@tpendragon 1 and 3 make sense to me. Can you expand on 2? I think I know what you mean, but want to be sure. |
I would be fine with @tpendragon suggestion for a modification to the retrieval algorithm...
|
I agree with the direction of this discussion. The example triple confuses me because it appears to conflate a real-world-object with its uri representation. if the subject uri is some name authority you would essentially be asserting that that person has an expiration date. It seems like some form of reification would solve that problem, though I'm not sure exactly what that would need to look like. |
@HackMasterA I see your point. I think it would be easy to avoid the triple in Marmotta based on the feedback from @acoburn. Blazegraph and other repository implementations may be more challenging. I would be less concerned with the conflation with a real-world-object if the predicate were better named, e.g. cache_expiration_dt. Based on feedback, for triplestore implementations, I propose...
For Marmotta, I would use the internal mechanism already in Marmotta. |
I am exploring adding cache entry expiration. I would like to get feedback from those using linked-data-fragments for caching.
Approach:
TimeToLive configuration - There will be a global TimeToLive value that serves as a default. There can also be a TimeToLive interval defined per host with the default used if the current URI's host does not have a separate TimeToLive configured.
ExpirationDT for a URI - Each cached URI will have an extra triple added to identify the date-time on which the cached entry for the URI expires and becomes invalid.
ExpirationDT = date_retrieved + TimeToLive(URI_host)
Modifications to Retrieval Algorithm
Predicate for ExpirationDT - I have not found a predicate that matches the concept exactly. The closest I have at the moment is http://vivoweb.org/ontology/core#expirationDate. I am open to suggestions for an alternate predicate.
Other additions that could be part of this work.
Optional ForceRecache - Retrieve method can have a parameter added to allow caller to request the URI's cache be updated from source. What would you want returned if the host is out of service?
LastModifiedDT - Add a new triple that holds the LastModifiedDT for the cache.
Thoughts on predicate choices. I am somewhat hesitant to use existing predicates that aren't cache specific. If the cached URI happens to use the same predicates, they would get clobbered by the cache added predicates. I'd like to see predicates: cache_expiration_dt and cache_last_modified_dt, and possible cache_create_dt. Others thoughts?
Please comment on this approach as soon as you can. I am looking at beginning work as soon as I get feedback.
The text was updated successfully, but these errors were encountered: