You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The AWS and GCP default configurations configure the Engine Pods to read models from shared network attached storage. This can increase the disk latency and possibly increase the latency of requests that require a model load, which is particularly sensitive for streaming requests.
There should be an option to copy models from the NFS onto the Pod's attached host storage to reduce read latency for models. This could be done once on startup, and possibly poll for updated models in NFS.
The text was updated successfully, but these errors were encountered:
Proposed changes
The AWS and GCP default configurations configure the Engine Pods to read models from shared network attached storage. This can increase the disk latency and possibly increase the latency of requests that require a model load, which is particularly sensitive for streaming requests.
There should be an option to copy models from the NFS onto the Pod's attached host storage to reduce read latency for models. This could be done once on startup, and possibly poll for updated models in NFS.
The text was updated successfully, but these errors were encountered: