-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement pre-packed blobs serialization on disk and their memory map…
…ping on load (#23069) ### Description <!-- Describe your changes. --> Pre-packing is a feature, that allows kernels to re-arrange weights data to gain performance at interference time Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated. This change enables pre-packed data to be stored on disk alongside with the external initializers. The pre-packed blobs are memory mapped and are loaded into either the X-session shared container or a new container that shares pre-packed blobs within the session. With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data. A separate container owned by a root `SessionState` owns and shares the data when X-session sharing is not enabled. To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing `ToGraphProtoWithExternalInitializers` function. To externalize the pre-packed weights, we introduce a new session option `kOrtSessionOptionsSavePrePackedConstantInitializers.` Note, that pre-packing should be enabled (default) for this to work. `ToGraphProtoWithExternalInitializers`function is modified to recurse into subgraphs to make sure we properly account for local initializer names. In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage by pre-packed initializers and externalize them.
- Loading branch information
1 parent
29bccad
commit 00b262d
Showing
28 changed files
with
1,308 additions
and
266 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#pragma once | ||
|
||
namespace onnxruntime { | ||
|
||
class PrepackedWeightsForGraph; | ||
|
||
// These options affect how the model initializers are written to the external file. | ||
// This includes options to align external initializer offset. | ||
// For models running on CPU, ORT will try to use mmap to load external | ||
// initializers. To use mmap, external initializer need to be offset aligned. | ||
// ORT saves external initializers into single data file, each initializer is | ||
// accessed with offset(start position of initializer) and length(byte length of | ||
// initializer) of the data file. To use mmap, each offset need to be aligned | ||
// which means offset need to divisible by allocation granularity(64KB for | ||
// windows and 4K for other OSes). With align_offset to true, ORT will align | ||
// offset for large initializer when save ONNX model with external data file. | ||
struct ModelSavingOptions { | ||
explicit ModelSavingOptions(size_t size_threshold) | ||
: initializer_size_threshold(size_threshold) {} | ||
|
||
// Mimimal initializer size in bytes to be externalized on disk | ||
size_t initializer_size_threshold; | ||
// Offset will always be page aligned and allocation granularity aligned for | ||
// mmap support. This is done by padding previous tensor data with zeros | ||
// keeping same length. | ||
bool align_offset = false; | ||
// Alignment threshold for size of data. | ||
// Having a low threshold will waste file space for small initializers. | ||
// Only when tensor's data size is > the page_align_threshold it will be force | ||
// aligned. Default to 1MB. | ||
int64_t align_threshold = 1048576; | ||
// The allocation Granularity for mmap() support. | ||
// Typically 64KB for Windows & 4KB for other OSes. Default to 64KB. | ||
#ifdef _WIN32 | ||
int64_t allocation_granularity = 65536; | ||
#else | ||
int64_t allocation_granularity = 4096; | ||
#endif | ||
}; | ||
|
||
} // namespace onnxruntime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.