Skip to content

[Request] Support for multiple resolution datasets #89

@hrothgar234567

Description

@hrothgar234567

Anima in particular is finetuned first at a low resolution and later for fewer steps at a higher resolution. This process has significant implications with respect to speeding up training times. Inspired by this, I ran some qualitative tests using a custom dataset.toml file which contained multiple dataset entries over the same training data but at different resolutions, and I modulated repeats for the subsets in those datasets in order to change the ratio of images at different resolutions. I found that there is very little (if any) quality loss as a result of training on lower resolutions at a 1:1 or 2:1 ratio when mixing combinations of 768x768, 1024x1024, and 1280x1280 images (weighted towards the lower end in the 2:1 case). Additionally, I often have issues preparing datasets where there exist constituent images significantly below the training resolution as I want to avoid having them be upscaled.

It would be nice to have the ability to create custom dataset entries with constituent subsets where we can specify training resolutions per dataset. This would allow both mixed-resolution training and the ability to create datasets containing just the smaller images such that they won't be upscaled.

As an alternative to implementing UI for creating datasets each with some arbitrary number of subsets, it may be sufficient to allow the user to specify a range of training resolutions (e.g. comma-delimited lists of numbers for both height and width) and simply duplicate the dataset entry for each resolution. This solution would not allow for allowing smaller images to be constrained only to the smaller datasets, but would allow coarse control of multi-res training.

As a note, the implementation of "Don't Upscale Images" and "Multi-Res Training" options in the bucket args pane don't really do what I'm suggesting here. The first creates too many buckets hampering batching and generally reducing quality in my experience (though others' mileage may very). The second doesn't provide enough granularity over which resolutions are actually trained.

In summary, the ability to easily prepare datasets which are trained at different resolutions allows for increased training speed by lowering the average resolution of images to be trained and prevents lower resolution images from needing to be upscaled to be included in the training data.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions