How do you use a column that was added in a select call, in a where/order by call? #408
-
Example: Identifying duplicate images by hash. import pixeltable as pxt
import imagehash
# Create user defined function
@pxt.udf
def calc_hash(image: pxt.Image) -> str:
return str(imagehash.average_hash(image))
# Create table for storing images
t = pxt.create_table('raw_images',{
'image': pxt.Image,
'filename': pxt.String
})
# Add Calculated Column to compute hash
t.add_computed_column(hash=calc_hash(table_raw_image.image))
# Insert several records ...
# Group and count duplicate images by hash
df = t.group_by(t.hash).select(t.hash, count=pxt.functions.count(1))
# How do I do this?
# df.where(count > 1)
#It would be cool if we could do df.where(df.count > 1) but it would also be nice to do the select and where all in a single method chain There seems to be a lack of documentation/examples on using group_by in general, but I was able to figure it out until I tried to do this. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
You are encountering a gap in the query functionality. Essentially, what you want is the equivalent of the SQL Having clause, or more generally, being able to query a DataFrame like a table: At the moment, Pixeltable doesn't support that. However, we are aware that this is useful and are planning on adding that functionality in the not-too-distant future. Can you tell us a little more about your use case? |
Beta Was this translation helpful? Give feedback.
You are encountering a gap in the query functionality. Essentially, what you want is the equivalent of the SQL Having clause, or more generally, being able to query a DataFrame like a table:
df = t.group_by(t.hash).select(t.hash, count=pxt.functions.count(1))
result = df.where(df.count > 1)
At the moment, Pixeltable doesn't support that. However, we are aware that this is useful and are planning on adding that functionality in the not-too-distant future.
Can you tell us a little more about your use case?