-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FastRGF] FastRGF doesn't work for small sample and need to fix integration test for FastRGF #92
Comments
I've asked Tong Zhang to look into this issue (and also another one about small values of weights) and he answered the following:
Also he asked you to create a PR if you fixed this
|
Unfortunately, fixing this issue is difficult for me... |
But maybe you can share your findings though isssue in their repo then? |
Of course I can! |
@fukatani Any updates? |
Some details from debuggingSteps to reproduce:
Operating System: Windows 10 64-bit Output:
UPD: Operating System: Ubuntu 18.04 64-bit Output: UPD2: Operating System: Ubuntu 16.04 32-bit Output: UPD3: Operating System: macOS 10.12 64-bit Output: UPD4: Operating System: Windows 10 64-bit Output:
UPD5: Operating System: Windows Server 2016 64-bit Output:
|
@TongZhang-ML Please take a look at logs. |
I've decided to reproduce the bug at continuous integration service, but still with no luck. Tests (refer to #159) have been run on Ubuntu 14.04 and macOS 10.12 with g++ versions from 5 to 8. No errors occurred. UPD: It seems to me, that according to the logs the bug is hidden around the multithreading here (Race Condition?), and that why it's so hard to reproduce. |
Tried without OpenMP: Set this
|
Any variable is not nullptr at here? For example, could you add code for dumping as follows?
With my expectation, it will crash with printf function. |
Or
If j>= num_yw, the code referencing |
@fukatani Seems you're right!
results in
|
Does assert passes in other OS? |
Yes, on macOS all asserts passed. Caution! Huge log. |
Thanks for tracking the bug. I believe the problem happened earlier. But if the problem can be duplicated on amazon ec2 windows instance, then I can help with debugging.
…----
Tong Zhang
On Jun 21, 2018, at 3:44 PM, Nikita Titov ***@***.***> wrote:
Yes, on macOS all asserts passed.
Caution! Huge log.
macOS log.txt <https://github.com/RGF-team/rgf_python/files/2125446/macOS.log.txt>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#92 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUJ-B2gNe8ArztC3o0iIRq803fpBwDMTks5t_BOlgaJpZM4Qzr5w>.
|
Today I've spend in many attempts to reproduce the error at Azure. I've tried all types of instances, which are available with free trial subscription (B, E, D, DS, F and so on), unfortunately, without any success. The only important thing that could be learned from this is that free trial limits the number of virtual cores to 4, however, failures occurred on machines with 8 vcores (Windows 10 64-bit from this my comment #92 (comment)) and 12 vcores (Windows Server 2016 64-bit from the same comment). @TongZhang-ML So, it seems, that if you want to reproduce the error, you should try to do it with more than 4 vcores Amazon instance. |
It can be an index error rather than a race condition. @TongZhang-ML
|
If it is an index error, I think we have to limits threads depending on the size of data. |
#Now, sklearn integration tests for
FastRGFClassifier
andFastRGFClassifier
.FastRGF doesn't work well for small samples, that is reason for test failed.
I doubt inside Fast RGF executable inside.
I inspect Fast RGF by debugger, discretization boundaries are invalid.
At least we should raise understandable error from RGF python if discretization failed.
The text was updated successfully, but these errors were encountered: