Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is predict(..., pred_contrib=True) thread safe? #5482

Open
zyxue opened this issue Sep 13, 2022 · 1 comment
Open

Is predict(..., pred_contrib=True) thread safe? #5482

zyxue opened this issue Sep 13, 2022 · 1 comment
Labels

Comments

@zyxue
Copy link
Contributor

zyxue commented Sep 13, 2022

Description

I'm using the model behind a gRCP prediction service. The service predicts one example at a time.

I can reproduce the bug locally like

data = [df.loc[:1][model.feature_name()]] * 1_000

def _predict(df_one_row):
    return model.predict(df_one_row, pred_contrib=True)

with ThreadPoolExecutor(max_workers=32) as exc:
    exc.map(_predict, data)

The error is like

free(): invalid next size (normal)

or

double free or corruption (!prev)

or

malloc(): corrupted top size

depending on different runs.

Reproducible example

I can produce some diff error, but also related to threading using the following code:

from concurrent.futures import ThreadPoolExecutor

import sklearn.datasets
import lightgbm

df = (
    sklearn.datasets.load_iris(as_frame=True)["frame"]
    .sample(99, random_state=123)
    .rename(
        columns={
            "sepal length (cm)": "sepal_length",
            "sepal width (cm)": "sepal_width",
            "petal length (cm)": "petal_length",
            "petal width (cm)": "petal_width",
        }
    )
    .assign(sepal_length_cat=lambda df: (df.sepal_length > 1).astype(str).astype('category'))
    .reset_index(drop=True)
)

X, y = df.drop(columns="target"), df["target"]

regressor = lightgbm.LGBMRegressor(n_estimators=100, max_depth=7, objective="mse").fit(
    X, y
)

regressor.fit(X, y)

model = regressor.booster_

print(f'{df.dtypes=:}')

data = [df.loc[:1][model.feature_name()]] * 10_000


def _predict(df_one_row):
    return model.predict(df_one_row, pred_contrib=True)


with ThreadPoolExecutor(max_workers=32) as exc:
    exc.map(_predict, data)

The error is like

corrupted size vs. prev_size
corrupted size vs. prev_size

or

python3: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Environment info

LightGBM version or commit hash:

lightgbm==3.3.2

Command(s) you used to install LightGBM

I'm using lightgbm inside a monorepo with bazel, but I think under the hood it's equivalent to python -m pip install lightgbm

Additional Comments

@shuttie
Copy link
Contributor

shuttie commented Jun 28, 2024

Seems to be an issue related to the pred_contrib=True handling in the C library itself: in the lightgbm4j library we have an issue with concurrent prediction crashing the whole JVM process due to the call not being thread-safe. See issue metarank/lightgbm4j#88 for details and a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants