Skip to content

Train/Test leak in RandomNodeSplit? #9331

Answered by rusty1s
AdarshMJ asked this question in Q&A
Discussion options

You must be logged in to vote

Don't you need to check every the splits in isolation?

import numpy as np

from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import RandomNodeSplit

dataset = Planetoid('/tmp/Cora', name='Cora')
data = dataset[0]
transform2 = RandomNodeSplit(split="test_rest", num_splits=10)
data = transform2(data)

for i in range(10):
    train_nodes = data.train_mask[:, i].nonzero(as_tuple=True)[0].cpu().numpy()
    test_nodes = data.test_mask[:, i].nonzero(as_tuple=True)[0].cpu().numpy()

    leakage_nodes = np.intersect1d(train_nodes, test_nodes)
    if len(leakage_nodes) > 0:
        print(
            f"Warning: Found {len(leakage_nodes)} nodes in both the training and t…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@AdarshMJ
Comment options

Answer selected by AdarshMJ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants