ENH: Performance optimization for pd.merge introducing fillna= argument #42683
Labels
Enhancement
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Performance
Memory or execution speed performance
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Is your feature request related to a problem?
Very often we merge very data sets(100Gb) with 'left'/'right'/'outer' which introduces NAs.
Basically, if we have integer field, we have two options:
Describe the solution you'd like
The obvious option is to use fill value, like -1 for NaN, it currently could be done after 1st or 2nd step, with fillna and conversion to integer type back. It will be quite simple to introduce new parameter inmerge fillna={ "column1":-1, "column2":-1} which do the same task without addition computations.
API breaking implications
By default fillna should be None, which will keep current behaviour.
p.s. the same thing is reasonable for pd.concat
The text was updated successfully, but these errors were encountered: