Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish Parametric UMAP model + add datasets #689

Open
17 tasks
NickleDave opened this issue Aug 14, 2023 · 0 comments
Open
17 tasks

Finish Parametric UMAP model + add datasets #689

NickleDave opened this issue Aug 14, 2023 · 0 comments
Labels
Models Issue related to models

Comments

@NickleDave
Copy link
Collaborator

NickleDave commented Aug 14, 2023

I added an initial Parametric UMAP model family + one example model in #688, fixing #631.
I went ahead and merged that in so we could work on other things--there were a lot of changes made to be able to add that model.

There's still additional work to be done though:

  • fix / better test prep step -- I notice I get a train split that is 0.99 seconds even when I set the target duration to 0.2 seconds, likewise I got a val split that was 0.97 seconds when I set the target duration to 0.1
    • is this because we are using entire files somehow?
  • figure out whether we need to shuffle for training -- not clear to me this is needed?
  • make sure we have access to labels for training and eval when needed
    • do we need a labelmap.json for this? We're not predicting labels so there's no reason to map <-> consecutive integers
  • finish predict function
    • test that vak.predict_.predict calls this function appropriately
  • add learncurve function
  • add metrics from this paper to val step / evaluation: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.13754
  • test whether embedding the entire dataset on a graph has an impact on validation / test set performance
    • i.e., is it ok to just embed val / test splits separately? how does this affect estimates of loss, other metrics
    • if so we should warn when people don't make val / test sets
  • add documentation with example tutorials
  • add some version of prepared datasets from Sainburg et al 2020 and models trained on those datasets
  • add / test ability to continue training of already trained models
  • add back and use labelmap for eval -- e.g. for metrics from https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.13754
  • modify training dataset in such a way that training doesn't always have to take forever; could we write a custom sampler that uses the probabilities to weight which samples it grabs for each batch?
  • evaluate the effect of hyperparameters / architecture on the model. To speed up tests I made the default number of filters in each layer of the ConvEncoderUMAP much smaller (in 454f159) and this dropped the checkpoint size from ~1.7GB -> 25MB
@NickleDave NickleDave added the Models Issue related to models label Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Models Issue related to models
Projects
None yet
Development

No branches or pull requests

1 participant