Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Obtaining individual tree results of a random forest #6115

Open
YuelingMa0 opened this issue Oct 18, 2024 · 5 comments
Open

[FEA] Obtaining individual tree results of a random forest #6115

YuelingMa0 opened this issue Oct 18, 2024 · 5 comments
Labels
feature request New feature or request

Comments

@YuelingMa0
Copy link

Is your feature request related to a problem? Please describe.
I wish I could use cuml to obtain individual tree results of a random forest. However, this function is not supported in the current cuml package. Using the random forest regression function in the current cuml package, I can only obtain the average of tree results.

Describe the solution you'd like
An attribute in the existing random forest regression function to provide results from each tree.

Describe alternatives you've considered
I have been using the "estimator_" in the RandomForestRegressor function of scikit-learn to obtain individual tree ouputs, but that package only works on CPUs.

Additional context
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

@YuelingMa0 YuelingMa0 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 18, 2024
@hcho3 hcho3 removed the ? - Needs Triage Need team to review and classify label Oct 22, 2024
@hcho3
Copy link
Contributor

hcho3 commented Oct 22, 2024

You can use the predict_per_tree function from the Forest Inference Library (FIL). Note that this feature is only available from the experimental version of FIL.

from cuml.experimental import ForestInference

# ...

fm = ForestInference.load_from_sklearn(skl_model)
pred_per_tree = fm.predict_per_tree(X)  # Returns array of size (num_row, num_tree, leaf_size) 

@YuelingMa0
Copy link
Author

Thank you!

@YuelingMa0
Copy link
Author

I got the error "Negative size passed to PyBytes_FromStringAndSize" when I loaded sklearn model. I am also curious if "predict_per_tree" attribute also works for a model trained by cuml?

@YuelingMa0 YuelingMa0 reopened this Oct 29, 2024
@hcho3
Copy link
Contributor

hcho3 commented Oct 29, 2024

"Negative size passed to PyBytes_FromStringAndSize" when I loaded sklearn model.

Can you share the model with us so that we can troubleshoot?

I am also curious if "predict_per_tree" attribute also works for a model trained by cuml?

Yes, it should work with a cuML model.

@YuelingMa0
Copy link
Author

Here are my random forest models, one trained using sklearn and the other trained using cuml. I converted the random forest model trained using cuml to ForestInference, and tried to use "predict_per_tree" for the cuML model. I obtained an attribute error "AttributeError: predict_per_tree". I am using the version 24.10.00.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants