Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add nsmallest/nlargest method support for extension array #42737

Closed
mocquin opened this issue Jul 26, 2021 · 2 comments
Closed

ENH: Add nsmallest/nlargest method support for extension array #42737

mocquin opened this issue Jul 26, 2021 · 2 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action

Comments

@mocquin
Copy link

mocquin commented Jul 26, 2021

When dealing with regular series, one can do :

import numpy as np
import pandas as pd
s = pd.Series(np.arange(10))
s.nsmallest(1) # returns a series contaning "0" as expected

When using an extension array (user-defined in my case), calling the .nsmallest method raises a TypeError (full message below):

import numpy as np
import physipandas
from physipy import m
import pandas as pd
sq = pd.Series(np.arange(10)*m, dtype='physipy[m]')
sq.nsmallest(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/5k/bf4syt7x1zjbhc6b28srzzym0000gn/T/ipykernel_72417/2048250381.py in <module>
      4 import pandas as pd
      5 sq = pd.Series(np.arange(10)*m, dtype='physipy[m]')
----> 6 sq.nsmallest(1)

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in nsmallest(self, n, keep)
   3861         dtype: int64
   3862         """
-> 3863         return algorithms.SelectNSeries(self, n=n, keep=keep).nsmallest()
   3864 
   3865     @doc(

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/algorithms.py in nsmallest(self)
   1220 
   1221     def nsmallest(self):
-> 1222         return self.compute("nsmallest")
   1223 
   1224     @staticmethod

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/algorithms.py in compute(self, method)
   1253         dtype = self.obj.dtype
   1254         if not self.is_valid_dtype_n_method(dtype):
-> 1255             raise TypeError(f"Cannot use method '{method}' with dtype {dtype}")
   1256 
   1257         if n <= 0:

TypeError: Cannot use method 'nsmallest' with dtype physipy[m]

This seems to happen because the extension dtype is not "registered" to is_valid_dtype_n_method.
Would it be feasable to support nsmallest/nlargest for extensions ?

pandas version : 1.3.0.

@mocquin mocquin added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021
@rhshadrach rhshadrach added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jul 31, 2021
@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
@jbrockmendel
Copy link
Member

Does your dtype have _is_numeric return True? That is the relevant check in is_valid_dtype_n_method

@mroeschke
Copy link
Member

It appears this is supported by specifying the correct attribute on the dtype so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants