-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC prediction are inconsistent when using max_depth
#52
Comments
max_depth
Looking further into this issue I believe it might be something with the final leave probabilities. They are slightly different when not growing the tree to the max. Therefore the final probability can deviate if the samples are very close to each other |
Thanks for your work and the given hints. I will check the outputs with more tests. Did you maybe check another languages? Or is it still a C issue? |
I did not check other languages yet, but I assume that they have the same problem. I can check tomorrow. |
Checked it in Java: Same results. I assume it will be the same in other languages. |
Okay, thank you for the double check. Then I will dig deeper in the original implementation. In particular in the difference between the different |
I think a good way to approach is to implement a Some more details I found in this stackoverflow comment thread:
similarly here: https://scikit-learn.org/stable/modules/tree.html#tree So I think if a |
This seems to be the case, indeed:
So we'd need to change the internal structure such that each tree does not return the class index but a probability vector. |
Hello @skjerns, JFYI, I started to implement the For that I began with the |
Hi @nok and @skjerns, I have actually looked into this as I wanted to integrate in the @nok, let me know how you wanna proceed and I can commit on a dev branch my changes to the C templates and the |
I have created a
RandomForestclassifier
in Python usingsklearn
. Now I convert the code to C usingsklearn-porter
. In around 10-20% of the cases the prediction of the transpiled code is wrong.I figured that the problem occurs when specifying
max_depth
.Here's some code to reproduce the issue:
I also saw that Python is performing calculations with double while the C code seems to use float, might that be an issue? (changing float -> double did not change anything unfortunately).
The text was updated successfully, but these errors were encountered: