Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add missing/infinite values counts in .describe #54076

Closed
1 of 3 tasks
lcrmorin opened this issue Jul 11, 2023 · 4 comments
Closed
1 of 3 tasks

ENH: add missing/infinite values counts in .describe #54076

lcrmorin opened this issue Jul 11, 2023 · 4 comments
Assignees
Labels
Enhancement Needs Info Clarification about behavior needed to assess issue

Comments

@lcrmorin
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

As scikit-learn often fail with missing or infinite values, I'd like a default way to count missing / infinite values. Currently we need to use describe and some .isna().sum() separately. It would be nice if the describe method could provide missing / infinite values counts. This could even be extended to count some user defined 'sentinel values'.

Feature Description

Add new rows to the output of the .describe method, with the count of missing values, count of infinite values.

Some parameters can be added to the describe function:

  • a list of 'sentinel' values so that the describe method also provide counts for those.
  • an option to provide frequency (proportion of total) instead of counts
  • an option to enable / disable those counts entirely

Alternative Solutions

The current solution is to do the counts outside of the describe function.

Additional Context

No response

@lcrmorin lcrmorin added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 11, 2023
@adirajput2000
Copy link

Hey @lcrmorin
I am excited to work with you on this.
Can you please help me out with a discussion?

@adirajput2000
Copy link

take

@jesse-sealand
Copy link

This definitely serves a need. Consider, if you haven't already, the ability to toggle it on and off. There are cases where the amount of data coming from describe can be overwhelming. Also, sometimes it's needed and sometimes it's not.

@mroeschke mroeschke added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
@mroeschke
Copy link
Member

Thanks for the suggestion but it doesn't seem like this feature request has gotten much traction from the core team so closing as inactive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants