Fix incorrect test set values in leave_k_out splits with sparse user rows #640
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #639
This PR fixes a bug in the evaluation of the
leave_k_out_split
in which the produced test matrix would contain values that were many multiples of their original value. Tests are also added on static (non-random) matrices that otherwise fail in the un-corrected implementation.This bug resulted from a calculation that required an input array with sequential values - the fact that non-sequential values were provided led to an error in processing.
Specifically, the
arr
argument in _take_tailswas being provided as
candidate_users
, from which user indices falling below the threshold were removed, resulting in a list in which the ordered set of unique integers was not consecutive and therefore the provided array was invalid.