Skip to content

First lookup taking longer #151

Closed Answered by maxbachmann
joaopauloucf asked this question in Q&A
Discussion options

You must be logged in to vote

You appear to use two different string metrics in the two cases:

Enter If: 0:00:00.005983
Logic: 0:00:00.137633

is using fuzz.token_set_ratio

Enter Else: 0:00:00.033875
Logic: 0:00:01.628670

is using fuzz.partial_ratio.

The performance difference should be caused by this. You might want to use process.cdist instead of process.extract. process.extract has to create a Python list of tuples, which is relatively slow.

process.cdist([processed_query], list(data[y]), scorer=fuzz.partial_ratio)

This returns a numpy matrix with all the similarities, which is faster to create. Or since you mention that this is called in a loop, you might be able to match multiple queries in parallel:

process.cdist(

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by maxbachmann
Comment options

You must be logged in to vote
1 reply
@joaopauloucf
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants