Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared layer weights are leaked when disposing a model #8471

Open
virgil-king opened this issue Dec 9, 2024 · 4 comments
Open

Shared layer weights are leaked when disposing a model #8471

virgil-king opened this issue Dec 9, 2024 · 4 comments
Assignees
Labels
comp:layers type:bug Something isn't working

Comments

@virgil-king
Copy link

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Fedora 40
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow.js installed from (npm or script link): pnpm
  • TensorFlow.js version (use command below): 4.22.0
  • Browser version: NA
  • Tensorflow.js Converter Version: NA

Describe the current behavior

  1. Load the model from the attached files
  2. Dispose the model

Result: 36 weights are not disposed. Those weights are all part of a subgraph of which there are 4 copies in the model:
conv2d_Conv2D[7-16]/kernel
conv2d_Conv2D[7-16]/bias
board_analysis_output/kernel
board_analysis_output/bias

Describe the expected behavior

All weights should be disposed.

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.
model.zip

import * as tf from "@tensorflow/tfjs-node-gpu";
/** See {@link tfRuntime.broadcastTo} */
class BroadcastLayer extends tf.layers.Layer {
    constructor(args) {
        super(args);
        this.shape = args.shape;
    }
    computeOutputShape(inputShape) {
        return this.shape.map((value, index) => {
            if (value == null) {
                return inputShape[index];
            }
            else {
                return value;
            }
        });
    }
    call(inputs) {
        return tf.tidy(() => {
            const derivedInput = this.getSingleTensor(inputs);
            const inputShape = derivedInput.shape;
            const nonNullShape = this.shape.map((value, index) => {
                if (value == null) {
                    return inputShape[index];
                }
                else {
                    return value;
                }
            });
            return derivedInput.broadcastTo([...nonNullShape]);
        });
    }
    getSingleTensor(input) {
        if (input instanceof tf.Tensor) {
            return input;
        }
        else if (Array.isArray(input) && input.length == 1) {
            return input[0];
        }
        else {
            throw new Error(`Expected one tensor but received ${input.length}`);
        }
    }
    getConfig() {
        const config = {
            shape: [...this.shape],
        };
        const baseConfig = super.getConfig();
        Object.assign(config, baseConfig);
        return config;
    }
    static fromConfig(cls, config) {
        return new cls(config);
    }
}
/** @nocollapse */
BroadcastLayer.className = "BroadcastLayer";
tf.serialization.registerClass(BroadcastLayer);
/** See {@link tf.Tensor.expandDims} */
class ExpandDimsLayer extends tf.layers.Layer {
    constructor(args) {
        super(args);
        this.dimensionIndices = args.shape;
    }
    computeOutputShape(inputShape) {
        let result = [...inputShape];
        for (const index of this.dimensionIndices) {
            result.splice(index, 0, 1);
        }
        return result;
    }
    call(inputs) {
        return tf.tidy(() => {
            let result = this.getSingleTensor(inputs);
            for (const index of this.dimensionIndices) {
                result = result.expandDims(index);
            }
            return result;
        });
    }
    getSingleTensor(input) {
        if (input instanceof tf.Tensor) {
            return input;
        }
        else if (Array.isArray(input) && input.length == 1) {
            return input[0];
        }
        else {
            throw new Error(`Expected one tensor but received ${input.length}`);
        }
    }
    getConfig() {
        const config = {
            shape: [...this.dimensionIndices],
        };
        const baseConfig = super.getConfig();
        Object.assign(config, baseConfig);
        return config;
    }
    static fromConfig(cls, config) {
        return new cls(config);
    }
}
ExpandDimsLayer.className = "ExpandDimsLayer";
tf.serialization.registerClass(ExpandDimsLayer);
const model = await tf.loadLayersModel(`file://${process.env.HOME}/model/model.json`);
model.dispose();
console.log(JSON.stringify(model.weights, undefined, 2));

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

@virgil-king virgil-king added the type:bug Something isn't working label Dec 9, 2024
@shmishra99 shmishra99 self-assigned this Dec 10, 2024
@shmishra99
Copy link
Contributor

Hi @virgil-king ,

I am not able to reproduce the piece of code that you have shared. Could you please clarify how to reproduce the code? Also, from the error that you have shared, I could check that if the weight is shared by multiple layers, the tf.dispose() function may not dispose it immediately. If my understadning is correct, you could also try disposing of the weights using the weight.dispose() function. In your last line of code, for the remaining weights, you can dispose them like this.

weights = model.weights
weights.forEach(weight => weight.dispose())

Let me know if it helps. Thank You!!

@virgil-king
Copy link
Author

Hi @shmishra99,

Thanks for looking at the bug.

To repro, you need Node.js installed. Create a directory with a file named package.json with contents like:

{
  "name": "bug",
  "version": "1.0.0",
  "description": "Weight leak repro",
  "main": "index.mjs",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@tensorflow/tfjs-node-gpu": "^4.22.0"
  }
}

Paste the sample code from my earlier comment into index.mjs in that directory.

Run npm install in that directory. Then run node index.mjs.

I changed my sample code to log tf.memory() at the end and that prints:

{
  unreliable: true,
  numTensors: 36,
  numDataBuffers: 36,
  numBytes: 503296
}

Your suggestion to dispose all of the weights of the model instead of disposing the model itself does fix the leak:

{ unreliable: true, numTensors: 0, numDataBuffers: 0, numBytes: 0 }

But model.dispose() should have the same effect right? Those layers aren't referenced from anywhere other than the model.

@shmishra99
Copy link
Contributor

model.dispose() should release all weights except for those shared between layers. These shared weights, referenced by multiple layers, must be manually disposed of using weight.dispose() after layer deletion.

@virgil-king
Copy link
Author

If that's really the intent, it should be added to the documentation (unless I missed it there), or else no one will know to do that.

But speaking of the intent, IMO model.dispose should dispose these weights. I guess the current logic of model.dispose is to only dispose weights whose reference count is one? If so, it should do that in a loop, until one iteration is unable to dispose any weights, since one loop iteration may cause additional weights to become disposable in the next iteration. Otherwise to be correct, every user of the API should do exactly that, and I don't think it's possible currently since the reference count isn't exposed. Other approaches like disposing every weight of the model would be incorrect when weights are shared across models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:layers type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants