fillna continuous data #14

Hahahah3 · 2022-03-18T02:11:36Z

Hello, I'm a beginner interested in Tabular Learning. Your superb paper, SAINT, impresses me a lot. But I've had some problems learning your code.

For

saint/old_version/data.py

Line 233 in e288e84

train.fillna(train.loc[train_indices, col].mean(), inplace=True)

or

saint/data_openml.py

Line 89 in e288e84

X.fillna(X.loc[train_indices, col].mean(), inplace=True)

a) Why is train.loc[train_indices, col] rather than train.loc[:, col]?
Vaild data and test data may also be nan.
b) Why is train.fillna rather than train[col].fillna?
It may fillnan for other columns.

I think the correct expression should be train[col].fillna(train.loc[:, col].mean(), inplace=True).

I'm not sure whether I am correct. I would appreciate it if you can reply. Thank you very much!

Mountiko · 2022-09-29T09:18:06Z

Hi,
I noticed it, as well.
This original code fills all Nan values across the dataframe with the mean from the first continuous column
X.fillna(X.loc[train_indices, col].mean(), inplace=True)

I would recommend using this code to fill all Nans with the mean of the corresponding column:
X[col].fillna(X.loc[train_indices, col].mean(), inplace=True)

Please feel free to correct me if I am wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fillna continuous data #14

fillna continuous data #14

Hahahah3 commented Mar 18, 2022

Mountiko commented Sep 29, 2022 •

edited

Loading

fillna continuous data #14

fillna continuous data #14

Comments

Hahahah3 commented Mar 18, 2022

Mountiko commented Sep 29, 2022 • edited Loading

Mountiko commented Sep 29, 2022 •

edited

Loading