Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets) Add DivideResplitter #2963

Merged
merged 17 commits into from
May 21, 2024
Merged

feat(datasets) Add DivideResplitter #2963

merged 17 commits into from
May 21, 2024

Conversation

adam-narozniak
Copy link
Member

@adam-narozniak adam-narozniak commented Feb 16, 2024

Issue

Certain splits of the dataset need to be divided into smaller splits. No functionality exists to accomplish that that can be used in the FederatedDataset abstraction.

Proposal

Create a DivideResplitter abstraction that solves the problem. See docstrings for more explanation.

@adam-narozniak adam-narozniak marked this pull request as ready for review February 16, 2024 09:22
@adam-narozniak adam-narozniak self-assigned this Feb 27, 2024
@adam-narozniak adam-narozniak changed the title Fds add divide resplitter Add DivideResplitter Feb 28, 2024
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
start_index = 0
end_index = 0
split_data = dataset[split_from]
assert isinstance(new_splits_dict, dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert will never be triggered, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at this point it's adjusted correctly (based on the python logic) but the type checkers complained

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's keep if they complain. w/ Py3.10 it seemed to be fine

datasets/flwr_datasets/federated_dataset.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
start_index = 0
end_index = 0
split_data = dataset[split_from]
assert isinstance(new_splits_dict, dict)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at this point it's adjusted correctly (based on the python logic) but the type checkers complained

@adam-narozniak adam-narozniak changed the title Add DivideResplitter feat(datasets) Add DivideResplitter May 17, 2024
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
datasets/flwr_datasets/resplitter/divide_resplitter.py Outdated Show resolved Hide resolved
start_index = 0
end_index = 0
split_data = dataset[split_from]
assert isinstance(new_splits_dict, dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's keep if they complain. w/ Py3.10 it seemed to be fine

@jafermarq jafermarq enabled auto-merge (squash) May 21, 2024 10:24
@jafermarq jafermarq merged commit 83da926 into main May 21, 2024
35 checks passed
@jafermarq jafermarq deleted the fds-add-divide-resplitter branch May 21, 2024 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants