Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a little mistake in the code #14

Open
gkyustc opened this issue Nov 26, 2019 · 4 comments
Open

a little mistake in the code #14

gkyustc opened this issue Nov 26, 2019 · 4 comments

Comments

@gkyustc
Copy link

gkyustc commented Nov 26, 2019

X_sub = X[y_transform == transformed_label, :]

I think this should be
X_sub = X_copy[y_transform == transformed_label, :]

@tgsmith61591
Copy link
Owner

Thanks for taking the time to file an issue. I think you're right. Feel free to file a PR, if you like. We haven't been working on this project in a while, but I have another project that built on this balancing codebase, should you be interested in using it: https://github.com/tgsmith61591/skoot

@gkyustc
Copy link
Author

gkyustc commented Nov 26, 2019

Thanks for your reply. In fact, I have met a problem about the imbalabced dataset recently. My dataset is so imbalanced that the variance of the Gaussian distribution is too low and the distribution is almost the same as Dirac delta function. we compare the image nums to the lables as following.
image
the x-axis represents the num of images in one id and the y-axis represents the corresponding labels' num. So I would like to ask you a double of questions:
Do you think it is a good way to use SMOTE to balance this kind of dataset?
And can you recommend me some efffiencient strategies to handle such problem?
Thanks very much!

@tgsmith61591
Copy link
Owner

The original SMOTE paper got its biggest performance boost by combining down-sampling with their method. I think it would absolutely be worth trying such a method. Things like:

  • Downsample
  • Perform SMOTE
  • Consider stratified batching (if you're using a minibatch family of algorithms)
  • Explore class-weighted loss functions

The skoot package I shared can help you downsample and perform SMOTE, but the other strategies will depend on your framework and family of algorithms.

@gkyustc
Copy link
Author

gkyustc commented Nov 26, 2019

Thanks for your help.
In fact I have tried other 3 methods except SMOTE, but still can not improve the performance. Since my dataset is 2D image of human bodies in different poses and SMOTE augment the dataset by linear interpolation, I was wondering this method may generate misleading images and did not consider it in the first place. But now maybe it is the only one I can count on......

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants