Kale-Linear is a Python library for non-deep, knowledge-aware machine learning from multiple sources, domains, or views. It provides NumPy-based methods for leveraging related data distributions and structural assumptions, including transfer learning, domain adaptation, manifold regularization, and group-aware learning, through a scikit-learn style API.
The package is part of the PyKale ecosystem and focuses on classical linear and kernel methods that are useful when data are structured by domain labels, covariates, side information, or unlabeled target samples.
- Transformer models for learning feature embeddings:
- Multilinear Principal Component Analysis (MPCA): Lu et al., 2008 [IEEE]
- Transfer Component Analysis (TCA): Pan et al., 2009 [paper]
- Joint Distribution Adaptation (JDA): Long et al., 2013 [paper]
- Balanced Distribution Adaptation (BDA): Wang et al., 2017 [paper]
- Maximum Independence Domain Adaptation (MIDA): Yan et al., 2017 [paper]
- Estimator models for classification and adaptation:
- Manifold Regularization Learning Framework (LapSVM, LapRLS): Belkin et al., 2006 [paper]
- Adaptation Regularization Learning Framework (ARSVM, ARRLS): Long et al., 2014 [paper]
- Covariate Independence Regularized Learning Framework (CoIRSVM, CoIRLS): Zhou et al., 2020 [paper], Zhou, 2022 [thesis]
- Group-specific Discriminant Analysis (GSDA): Zhou et al., 2025 [paper], Zhou, 2022 [thesis]
- NumPy-compatible inputs and outputs.
- scikit-learn style
fit,transform,predict,fit_transform, andfit_predictworkflows where applicable. - Optional covariate encoding for categorical domain or group labels.
Install the released package from PyPI:
pip install kalelinearInstall from a local checkout for development:
pip install -e ".[dev]"Kale-Linear requires Python 3.10 or later. Core dependencies include:
import numpy as np
from kalelinear.transformer import TCA
X = np.array(
[
[-2.0, -1.8],
[-1.8, -2.1],
[1.9, 1.7],
[2.1, 2.0],
[-1.4, -1.2],
[-1.2, -1.1],
[1.2, 1.1],
[1.4, 1.3],
]
)
domain_labels = np.array([0, 0, 0, 0, 1, 1, 1, 1])
transformer = TCA(n_components=2)
z = transformer.fit_transform(X, covariates=domain_labels, target_covariate=1)
z_source = z[domain_labels == 0]
z_target = z[domain_labels == 1]TCA, JDA, and BDA take domain labels through covariates. They do not accept
separate source and target arrays; stack samples into one array and use
target_covariate to identify the target domain.
import numpy as np
from kalelinear.transformer import MIDA
x = np.random.default_rng(0).normal(size=(8, 4))
y = np.array([0, 0, 1, 1, 0, 0, 1, 1])
domains = np.array(["source", "source", "source", "source", "target", "target", "target", "target"])
transformer = MIDA(n_components=2, covariate_encoder="onehot")
z = transformer.fit_transform(x, y=y, covariates=domains)For ARSVM and ARRLS, pass all source and target samples in x, labels for the
source samples in y, and a covariate vector identifying the target domain.
import numpy as np
from kalelinear.estimator import ARSVM
x = np.array(
[
[-2.2, -1.9],
[-1.9, -2.1],
[1.8, 2.1],
[2.0, 1.9],
[-1.4, -1.2],
[-1.1, -1.3],
[1.3, 1.1],
[1.5, 1.2],
]
)
source_labels = np.array([0, 0, 1, 1])
domains = np.array([0, 0, 0, 0, 1, 1, 1, 1])
x_target = x[domains == 1]
clf = ARSVM()
clf.fit(x, source_labels, covariates=domains, target_covariate=1)
y_pred = clf.predict(x_target)LapSVM and LapRLS can use labeled source samples together with unlabeled target samples. The labels array may contain only the labeled source examples.
import numpy as np
from kalelinear.estimator import LapSVM
x_source = np.array([[-2.0, -1.8], [-1.8, -2.1], [1.9, 1.7], [2.1, 2.0]])
ys = np.array([0, 0, 1, 1])
x_target = np.array([[-1.4, -1.2], [-1.2, -1.1], [1.2, 1.1], [1.4, 1.3]])
x_train = np.vstack((x_source, x_target))
clf = LapSVM(kernel="linear")
clf.fit(x_train, ys)
y_pred = clf.predict(x_target)from kalelinear.transformer import BDA, JDA, MIDA, MPCA, TCA
from kalelinear.estimator import ARRLS, ARSVM, CoIRLS, CoIRSVM, GSDA, LapRLS, LapSVMFrom the root of the repository, run the following commands in your terminal:
-
Install pre-commit hooks (only required once):
pre-commit install
-
Run pre-commit checks for code style and formatting on all files:
pre-commit run --all-files
-
Run test cases to verify functionality:
pytest
-
Build the documentation:
pip install -r docs/requirements.txt sphinx-build -b html docs/source docs/build/html
- POT: Python Optimal Transport
- Everything about Transfer Learning
- ADA: Another Domain Adaptation library
- Domain Adaptation and Transfer Learning Repositories
- Library of transfer learners and domain-adaptive classifiers
- domain-adaptation-toolbox
- Domain-Adaptations
Kale-Linear is released under the MIT License. See LICENSE for details.