LabiCompare is a Python library focused on the evaluation and statistical comparison of Machine Learning models. It simplifies the process of running hypothesis tests (such as Friedman and Wilcoxon-Holm) and generates visualizations, including Critical Difference (CD) Diagrams.
Via PyPI (future - WIP):
pip install labicompareOr install it locally from the source code:
git clone https://github.com/jose-gilberto/labicompare.git
cd labicompare
pip install -e .Here is how easy it is to compare the performance of multiple models across different datasets or folds:
import pandas as pd
from labicompare.core.data import EvaluationData
from labicompare.stats import evaluate_models
from labicompare.plots.ranking import plot_cd_diagram
# 1. Prepare your data (Rows = Datasets/Folds, Columns = Models)
data_dict = {
'Model_A': [0.85, 0.88, 0.82, 0.89],
'Model_B': [0.86, 0.89, 0.84, 0.90],
'Model_C': [0.70, 0.75, 0.72, 0.71],
'Proposed_Model': [0.91, 0.93, 0.89, 0.95]
}
df = pd.DataFrame(data_dict)
# 2. Wrap the data (Accuracy: higher_is_better=True)
eval_data = EvaluationData(df, higher_is_better=True)
# 3. Run the statistical tests (e.g., Friedman + Wilcoxon-Holm)
summary = wilcoxon_holm(eval_data, alpha=0.05)
print(summary)
# Output: ComparisonSummary(Friedman P-Value=0.0012, H0=REJECTED, Models=4)
# 4. Generate the Critical Difference Diagram (CD Diagram)
fig = plot_cd_diagram(
data=eval_data,
summary=summary,
title="Model Comparison (Accuracy)"
)
fig.savefig("cd_diagram.png", dpi=300, bbox_inches='tight')The base class that ingests your pandas.DataFrame.
Key Parameter: higher_is_better (Boolean). Use True for metrics like Accuracy and F1-Score, and False for error metrics like RMSE or MAE. The library automatically handles ranking inversions under the hood.
The object returned by the statistical testing functions. It stores:
- Global results (Friedman's P-value).
- Pairwise results (
pairwise_results), including which model won and whether the difference is statistically significant. - Built-in Export: You can use
summary.to_dataframe()to export the results into a tabular format, making it easy to convert to LaTeX or Markdown for your papers.
The visual tool for comparing models. Our implementation features an enhanced UX tailored for academic publishing:
- Bilateral Layout: Models are split evenly on both sides to prevent text overlap.
- Maximal Cliques: Thick bars group models that have no statistically significant difference (ties), automatically preventing redundant sub-lines.
- Inline Rankings: The exact average rank is displayed cleanly beneath each model's name.
If you are proposing a new model and want it to stand out in the diagram, use the highlight parameters:
fig = plot_cd_diagram(
data=eval_data,
summary=summary,
highlight_models=['Proposed_Model'],
highlight_color='#d97706' # Optional custom color (Default: Amber/Orange)
)We welcome contributions from the community! Whether you want to fix a bug, add a new statistical test, or improve the documentation, your help is highly appreciated.
-
Fork the repository and clone it locally:
git clone https://github.com/jose-gilberto/labicompare.git cd labicompare -
Create a virtual environment and install the development dependencies:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate pip install -e ".[dev]" # Ensure you have a [dev] extra in your pyproject.toml or setup.py
All new features and bug fixes should be accompanied by unit tests. We use pytest as our testing framework.
To run the test suite:
pytest tests/Create a new branch for your feature or bugfix (git checkout -b feature/my-awesome-feature).
Make your changes and commit them with descriptive messages.
Ensure all tests pass and the code is properly formatted.
Push your branch to your fork (git push origin feature/my-awesome-feature).
Open a Pull Request against the main branch of this repository. Include a clear description of the changes and any related issue numbers.
If you find a bug or have a feature request, please open an issue on GitHub. Provide as much detail as possible, including steps to reproduce bugs or a clear rationale for new features.
If you use labicompare in your research or project, please consider citing it.
@misc{labicompare2026,
author = {José Gilberto Barbosa de Medeiros Júnior},
title = {labicompare: statistical comparison and visualization for Machine Learning models},
year = {2026},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{[https://github.com/jose-gilberto/labicompare](https://github.com/jose-gilberto/labicompare)}},
}