Skip to content

De-duplicate between image data collections #39292

Discussion options

You must be logged in to vote

It is difficult to define the threshold. The score/distance values are not in a linear curve. If you use different embedding model, the threshold is different. You can do some tests to observe the result and determine a "good threshold", but perhaps this threshold doesn't work well in some other cases.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by Go-MinSeong
Comment options

You must be logged in to vote
1 reply
@xiaofan-luan
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants