A proof-of-concept pipeline for performing hyperparameter optimization of machine learning models with Nextflow.
-
Install Nextflow (version 23.10 or higher):
curl -s https://get.nextflow.io | bash -
Launch the pipeline:
# use conda natively (requires Conda) ./nextflow run nextflow-io/hyperopt -profile test,conda # use Wave containers (requires Docker) ./nextflow run nextflow-io/hyperopt -profile test,wave
-
When the pipeline completes, you can view the training and prediction results in the
resultsfolder.
Note
When you run the pipeline for the first time, it will take a moment to download the pipeline from this GitHub repository and any related software dependencies (e.g. conda packages or Docker images).
The hyperopt pipeline consists of the following steps:
- Prepare train/test splits from OpenML or user-provided datasets
- Visualize the train/test sets
- Train a variety of models on each training set
- Evaluate each model against each test set
- Report the best model for each dataset based on evaluation score
You can control many aspects of this workflow with the pipeline parameters, including:
- Download any number of datasets from OpenML.org (default is
wdbc) - Evaluate against a number of model types (default is
dummy,gb,lr,mlp,rf) - Provide your own train/test splits
- Provide your own pre-trained models
See the nextflow.config file for the list of pipeline parameters.
Since Nextflow provides an abstraction between the pipeline logic and the underlying execution environment, the hyperopt pipeline can be executed seamlessly on a local machine, an HPC cluster, or a cloud provider.
See the Nextflow documentation to learn more about Executors and Configuration.
The hyperopt pipeline uses Python (>=3.14) and several Python packages for machine learning and data science. These dependencies are defined in the conda.yml file.