You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to create my custom algo and before moving to something more complicated I wanted to make sure I understand how current code is working.
I couldn't understand why Cosine similarity between two particular users is 1.0 whereas they have different ratings for common item.
Steps/Code to Reproduce
from surprise import AlgoBase, BaselineOnly, NormalPredictor
from surprise import PredictionImpossible
class MyAlgorithm(AlgoBase):
def __init__(self, sim_options={}, bsl_options={}):
AlgoBase.__init__(self, sim_options=sim_options, bsl_options=bsl_options)
def fit(self, trainset):
AlgoBase.fit(self, trainset)
# Compute baselines and similarities
self.bu, self.bi = self.compute_baselines()
self.sim = self.compute_similarities()
return self
def estimate(self, u, i):
if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
raise PredictionImpossible("User and/or item is unknown.")
# Compute similarities between u and v, where v describes all other
# users that have also rated item i.
neighbors = [(v, self.sim[u, v]) for (v, r_ignored) in self.trainset.ir[i]]
# Sort these neighbors by similarity
neighbors = sorted(neighbors, key=lambda x: x[1], reverse=True)
print(f"estimate({u}, {i}):")
print("The 3 nearest neighbors of user", str(u), "are:")
for v, sim_uv in neighbors[:3]:
print(f"user {v} with sim {sim_uv:1.2f}")
# ... Aaaaand return the baseline estimate anyway ;)
bsl = self.trainset.global_mean + self.bu[u] + self.bi[i]
return bsl
Preparing the data ( taking subsample of 20K to make exploration/investigation faster):
I found only one common item and the ratings are different
(891, 2.0) vs (891, 3.0),
I didn't specify min_support but it shouldn't matter as when we below it we get 0. It means we are greater or equal than min_support value ( probably default is 1 ).
I would expect similarity not to be equal 1 as ratings are not the same ( 2.0 vs 3.0)
Actual Results
The 3 nearest neighbours of user 171 are:
user 1123 with sim 1.00
Description
I was trying to create my custom algo and before moving to something more complicated I wanted to make sure I understand how current code is working.
I couldn't understand why Cosine similarity between two particular users is 1.0 whereas they have different ratings for common item.
Steps/Code to Reproduce
Preparing the data ( taking subsample of 20K to make exploration/investigation faster):
Then I'm running my new custom algo:
Expected Results
In the printout ( the one that is provided in docs ) I was looking for some neigbors with non zero similarity and for example took this one:
Note that I also added logging for the current estimate function parameters to know which item we are predicting for ( 428 in this example)
So I see that algo considered user 171 and 1123 to be similar. I decided to check it manually.
As
"user_based": True
then we are calculating similarity between user 171 and other users, that have ratings for 428.So i checked
trainset_full.ir[428]
Output:
Then I decided to check rating for these 2 users
171
and1123
to see whether they have similar ratings for common items.I found only one common item and the ratings are different
I didn't specify
min_support
but it shouldn't matter as when we below it we get 0. It means we are greater or equal thanmin_support
value ( probably default is 1 ).I would expect similarity not to be equal 1 as ratings are not the same ( 2.0 vs 3.0)
Actual Results
The 3 nearest neighbours of user 171 are:
user 1123 with sim 1.00
Versions
macOS-10.16-x86_64-i386-64bit
Python 3.10.14 (main, May 6 2024, 14:47:20) [Clang 14.0.6 ]
surprise 1.1.4
The text was updated successfully, but these errors were encountered: