Is predict(..., pred_contrib=True) thread safe? #5482

zyxue · 2022-09-13T18:38:48Z

Description

I'm using the model behind a gRCP prediction service. The service predicts one example at a time.

I can reproduce the bug locally like

data = [df.loc[:1][model.feature_name()]] * 1_000

def _predict(df_one_row):
    return model.predict(df_one_row, pred_contrib=True)

with ThreadPoolExecutor(max_workers=32) as exc:
    exc.map(_predict, data)

The error is like

free(): invalid next size (normal)

or

double free or corruption (!prev)

or

malloc(): corrupted top size

depending on different runs.

If I set max_workers=1, the error seems to go away.
If I remove pred_contrib=True, the error seems to go away, so it looks predict alone is thread-safe, consistent with Is the function "LGBM_BoosterPredictForMat" thread safe? #666

Reproducible example

I can produce some diff error, but also related to threading using the following code:

from concurrent.futures import ThreadPoolExecutor

import sklearn.datasets
import lightgbm

df = (
    sklearn.datasets.load_iris(as_frame=True)["frame"]
    .sample(99, random_state=123)
    .rename(
        columns={
            "sepal length (cm)": "sepal_length",
            "sepal width (cm)": "sepal_width",
            "petal length (cm)": "petal_length",
            "petal width (cm)": "petal_width",
        }
    )
    .assign(sepal_length_cat=lambda df: (df.sepal_length > 1).astype(str).astype('category'))
    .reset_index(drop=True)
)

X, y = df.drop(columns="target"), df["target"]

regressor = lightgbm.LGBMRegressor(n_estimators=100, max_depth=7, objective="mse").fit(
    X, y
)

regressor.fit(X, y)

model = regressor.booster_

print(f'{df.dtypes=:}')

data = [df.loc[:1][model.feature_name()]] * 10_000


def _predict(df_one_row):
    return model.predict(df_one_row, pred_contrib=True)


with ThreadPoolExecutor(max_workers=32) as exc:
    exc.map(_predict, data)

The error is like

corrupted size vs. prev_size
corrupted size vs. prev_size

or

python3: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Environment info

LightGBM version or commit hash:

lightgbm==3.3.2

Command(s) you used to install LightGBM

I'm using lightgbm inside a monorepo with bazel, but I think under the hood it's equivalent to python -m pip install lightgbm

Additional Comments

The text was updated successfully, but these errors were encountered:

shuttie · 2024-06-28T13:20:02Z

Seems to be an issue related to the pred_contrib=True handling in the C library itself: in the lightgbm4j library we have an issue with concurrent prediction crashing the whole JVM process due to the call not being thread-safe. See issue metarank/lightgbm4j#88 for details and a reproducer.

jameslamb added the question label Feb 1, 2023

shuttie mentioned this issue Jun 28, 2024

face the memory problem when use PredictFormat metarank/lightgbm4j#88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is predict(..., pred_contrib=True) thread safe? #5482

Is predict(..., pred_contrib=True) thread safe? #5482

zyxue commented Sep 13, 2022

shuttie commented Jun 28, 2024