Replies: 2 comments 2 replies
-
I think we wrote this section for over/undersampling because of the constant requests for SMOTE. The problem with this approach is you are showing the exact same instances the model already saw (sample with replacement), so there is no new information there. I think the best person to chime in on this is @janezd. |
Beta Was this translation helpful? Give feedback.
2 replies
-
We discussed this in #3269. I stand by my opinion. :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It was pointed out in the help section of the Data Sampler widget that it could be used for under or oversampling. I used the Attrition dataset where the class imbalance is 1233/237. I separated the minority class with the Select Rows widget and connected to the Data Sampler widget where I chose both Fixed sample size = 1000 and sample with replacement. I then connected the Concatenate widget to the initial dataset, the Select Rows widget and the Data Sampler widget to create a new dataset where the minority and majority classes were about the same. Classification performance (AUC) with logistic regression, neural networks, and Ada Boost improved and classification accuracy stayed about the same.
I fully understand the Python options to over or under-sample but the goal of my workshop for older clinicians is to try to do as much using Orange as possible. Do you believe this is a legitimate approach to oversample the minority class?
Beta Was this translation helpful? Give feedback.
All reactions