Releases: lRomul/argus
Maintenance release, new guides and updated docs
Fix
- Fix
AverageMeter
for n > 1 cases.
Breaking Changes
- Delete batch object after iteration complete.
- Don't store data loader in the state of engine.
New Features
- Return metrics from fit method the same way as from from validate.
- Use constructor from
BuildModel
to be able to user passbuild_order
.
Docs
New guides on:
- Custom metrics.
- Partial weights loading and manipulation.
- Model export.
- Custom callbacks.
- LR schedulers.
Other improvements:
- New solutions of competitions to examples.
- Improve docstrings in many places.
Chore
- Use
pyproject.toml
. - Update GitHub Actions versions.
- Update dependencies.
- Use ruff linter.
Full Changelog: v1.0.0...v1.1.0
Argus 1.0.0
New Features
- Add mode argument to
argus.Model.train
(like in torch).
Docs
- Add guides that provide an in-depth overview of how the framework works (link).
- Fix minor typos in docstrings.
Examples
- New example with sequential LR scheduler (link).
- Transitioning from torch.distributed.launch to torchrun in cifar_advanced example.
Chore
- Add
__all__
for all modules. - Update CUDA 11.3.1.
- Update PyTorch 1.10.0.
Logo, pydata sphinx theme, custom state loading, share train and val states
New Features
-
Share train and val states between phases with
phase_states
attribute of state.@argus.callbacks.on_epoch_complete def some_validation_callback(state: argus.engine.State): train_step_output = state.phase_states['train'].step_output ...
-
Option to use custom state load function for
argus.load_model
.def state_load_from_dir(dir_path): file_path = pathlib.Path(dir_path) / 'some_model_name.pth' return torch.load(dir_path) model = load_model(path_to_dir_with_model, state_load_func=custom_state_load_func)
Docs
- Argus logo!
- Migrate to pydata-sphinx-theme.
Fix
- Fix sdist package installation by adding
MANIFEST.in
withrequirements.txt
.
Examples
- Use
torch.cuda.amp
instead Apex in advanced CIFAR example. - Add as an example solution for RANZCR CLiP - Catheter and Line Position Challenge.
Chore
setup.cfg
with pytest and flake8 settings.- CI check code style with flake8.
- Run tests on macOS and Windows.
- Update Dockerfile and tests to PyTorch 1.8.0.
- Update Dockerfile to CUDA 11.1.
Save optimizer state, improve docs and typing
New Features
- Add saving of optimizer state for
argus.Model
and checkpoint callbacks.model.save('models/model.pth', optimizer_state=True) checkpoint = Checkpoint(dir_path='models/', optimizer_state=True)
- Add
get_device
method toargus.Model
. - Typing and fixing most cases of
mypy
errors.
Fix
- Remove
torch.optim._multi_tensor
optimizers from defaults (torch >= 1.7.0
).
Docs
- Section
argus.engine
. - Section
argus.metrics
. - Section
argus.utils
with deep conversions. - Add docs for decorator callbacks.
- Add docs for
argus.Model
methods:__init__
,set_device
,get_device
,get_nn_module
. - Update examples section.
- Proofread and improve docs. Many small docstring fixes.
Internal changes
- Use abstract container classes from
collections.abc
. - Now
Engine
andState
only work with theargus.Model
methods as astep_method
. Phase name takes from the method name. - Simplify default logging.
Breaking Changes
- Change optimizer state in
argus.load_model
. Nowchange_state_dict_func
takes two argumentsnn_state_dict
andoptimizer_state_dict
(example). - Remove
handler_kwargs_dict
from the attach method ofargus.callbacks.Callback
.
Tests, replace params while model loading, custom events
New Features
- Tests, 100% coverage (codecov).
- Mechanism of
params
replacement while model loading (example).# change optimizer params model = load_model(model_path, optimizer=('AdamW', {'lr': 0.001})) # load model without optimizer and loss model = load_model(model_path, optimizer=None, loss=None)
- Custom events for callbacks (example).
import argus from argus.engine import EventEnum class CustomEvents(EventEnum): BACKWARD_START = 'backward_start' BACKWARD_COMPLETE = 'backward_complete' @argus.callbacks.on_event(CustomEvents.BACKWARD_START) def before_backward(state): ... class CustomEventModel(argus.Model): ... def train_step(self, batch, state): ... state.engine.raise_event(CustomEvents.BACKWARD_START) loss.backward() state.engine.raise_event(CustomEvents.BACKWARD_COMPLETE) ...
- Typing.
- Raise exceptions instead asserts.
- Setup unique logger for each instance of
argus.Model
. - Check that
params
is a pickleble at model construction. create_dir
parameter forargus.callbacks.logging.LoggingToCSV
.- Use instance of
argus.utils.Identity
as default forprediction_transform
instead oflambda x: x
.
Fix
- Correctly save checkpoints with
save_after_exception
argument forargus.callbacks.checkpoints
.
Breaking Changes
- Change default
append
argument value toFalse
forargus.callbacks.logging.LoggingToFile
. - Rename attribute
_scheduler
ofargus.callbacks.lr_schedulers.LRScheduler
toscheduler
.
Custom build methods, more examples
Features
- New mechanics building of attributes. It allows customizing the creation of model parts. Example here.
- CIFAR example with Distributed Data Parallel, mixed precision, and gradient accumulation cifar_advanced.py.
- Add
save_model
method toargus.callbacks.checkpoints
. It allows customizing checkpoint saving. - Add logging time and LR to
argus.callbacks.logging.LoggingToCSV
. argus.utils.deep_chunk
similar to scatter function in PyTorch DataParallel.- Dockerfile and Makefile for developing.
Breaking Changes
- Use
argus.utils.deep_to
function instead methodargus.Model.prepare_batch
.argus.Model.prepare_batch
removed so if you use customval_train
ortrain_step
you should change replacetoinput, target = self.prepare_batch(batch, self.device)
input, target = deep_to(batch, self.device, non_blocking=True)
- Rename
max_epochs
tonum_epochs
ofargus.Model.fit
method.model.fit(train_loader, val_loader=val_loader, num_epochs=1000)
- Remove
copy_last
parameter fromargus.callbacks.checkpoints
. - Remove
period
parameter fromargus.callbacks.checkpoints.MonitorCheckpoint
.
Documentation, LR scheduler step on iteration, new LR schedulers
New Features
- Documentation https://pytorch-argus.readthedocs.io
- Add step on iteration option for LR schedulers.
from argus.callbacks import CosineAnnealingLR CosineAnnealingLR(10000, step_on_iteration=True)
- New LR schedulers.
argus.callbacks.lr_schedulers.MultiplicativeLR
: Multiply learning rate by the factor given in the specified function.argus.callbacks.lr_schedulers.OneCycleLR
: One Cycle learning rate policy.
- Make LR scheduler step on epoch complete instead start.
- Compute metric score with torch no grad.
Fix
- Fix LR logging with several parameters group in optimizer.
- Fix key error in redefine metric warning.
Breaking Changes
- PyTorch requirements
torch>=1.1.0
.
New LR schedulers, csv logger, state in step functions
New Features
-
CyclicLR
andCosineAnnealingWarmRestarts
LR schedulers.argus.callbacks.lr_schedulers.CyclicLR
: Support for Cyclical Learning Rate and Momentum.argus.callbacks.lr_schedulers.CosineAnnealingWarmRestarts
: Stochastic Gradient Descent with Warm Restarts.
-
argus.callbacks.logging.LoggingToCSV
: add csv logger callback.from argus.callbacks import LoggingToCSV LoggingToCSV('path/to/log.csv', separator=',', append=False)
-
Add
train
andeval
mode methods toargus.Model
.model.train()
sets thenn_module
in training mode.model.eval()
sets thenn_module
in evaluation mode
-
Set
step_output
ofState
toNone
after each iteration for saving GPU memory.
Breaking Changes
-
Pass state to train and val step functions:
Before:
def train_step(self, batch): ...
Now:
def train_step(self, batch, state: State): print(state.epoch) ...
-
Scheduler step on epoch start, train epochs from 0 to max_epochs - 1. The scheduler callback uses the epoch param of a scheduler step function, so it now works like in 20124.
-
Remove deprecated
to_device
anddetach_tensors
utils functions
Data parallel
Data parallel for multi-gpu training.
Select gpu with device indexing:
model = load_model(model_path, device="cuda:1")
model.set_device("cuda:0")
For multi-gpu you can use list of devices:
params = {
...,
'device': ['cuda:0', 'cuda:1']
}
model = CnnFinetune(params)
model = load_model(model_path, device=["cuda:1", "cuda:0"])
model.set_device(["cuda:0", "cuda:1"])
Batch tensors will be scattered on dim 0. First device in list is location of output.
By default device "cuda" is one gpu training on torch.cuda.current_device.