Skip to content

pykale/kale-linear

Repository files navigation

Kale-Linear

Kale-Linear is a Python library for non-deep, knowledge-aware machine learning from multiple sources, domains, or views. It provides NumPy-based methods for leveraging related data distributions and structural assumptions, including transfer learning, domain adaptation, manifold regularization, and group-aware learning, through a scikit-learn style API.

The package is part of the PyKale ecosystem and focuses on classical linear and kernel methods that are useful when data are structured by domain labels, covariates, side information, or unlabeled target samples.

Features

  • Transformer models for learning feature embeddings:
    • Multilinear Principal Component Analysis (MPCA): Lu et al., 2008 [IEEE]
    • Transfer Component Analysis (TCA): Pan et al., 2009 [paper]
    • Joint Distribution Adaptation (JDA): Long et al., 2013 [paper]
    • Balanced Distribution Adaptation (BDA): Wang et al., 2017 [paper]
    • Maximum Independence Domain Adaptation (MIDA): Yan et al., 2017 [paper]
  • Estimator models for classification and adaptation:
    • Manifold Regularization Learning Framework (LapSVM, LapRLS): Belkin et al., 2006 [paper]
    • Adaptation Regularization Learning Framework (ARSVM, ARRLS): Long et al., 2014 [paper]
    • Covariate Independence Regularized Learning Framework (CoIRSVM, CoIRLS): Zhou et al., 2020 [paper], Zhou, 2022 [thesis]
    • Group-specific Discriminant Analysis (GSDA): Zhou et al., 2025 [paper], Zhou, 2022 [thesis]
  • NumPy-compatible inputs and outputs.
  • scikit-learn style fit, transform, predict, fit_transform, and fit_predict workflows where applicable.
  • Optional covariate encoding for categorical domain or group labels.

Installation

Install the released package from PyPI:

pip install kalelinear

Install from a local checkout for development:

pip install -e ".[dev]"

Kale-Linear requires Python 3.10 or later. Core dependencies include:

Quick Start

Learn a Domain-Invariant Embedding

import numpy as np
from kalelinear.transformer import TCA

X = np.array(
    [
        [-2.0, -1.8],
        [-1.8, -2.1],
        [1.9, 1.7],
        [2.1, 2.0],
        [-1.4, -1.2],
        [-1.2, -1.1],
        [1.2, 1.1],
        [1.4, 1.3],
    ]
)
domain_labels = np.array([0, 0, 0, 0, 1, 1, 1, 1])

transformer = TCA(n_components=2)
z = transformer.fit_transform(X, covariates=domain_labels, target_covariate=1)

z_source = z[domain_labels == 0]
z_target = z[domain_labels == 1]

TCA, JDA, and BDA take domain labels through covariates. They do not accept separate source and target arrays; stack samples into one array and use target_covariate to identify the target domain.

Use MIDA with Categorical Covariates

import numpy as np
from kalelinear.transformer import MIDA

x = np.random.default_rng(0).normal(size=(8, 4))
y = np.array([0, 0, 1, 1, 0, 0, 1, 1])
domains = np.array(["source", "source", "source", "source", "target", "target", "target", "target"])

transformer = MIDA(n_components=2, covariate_encoder="onehot")
z = transformer.fit_transform(x, y=y, covariates=domains)

Train a Domain Adaptation Classifier

For ARSVM and ARRLS, pass all source and target samples in x, labels for the source samples in y, and a covariate vector identifying the target domain.

import numpy as np
from kalelinear.estimator import ARSVM

x = np.array(
    [
        [-2.2, -1.9],
        [-1.9, -2.1],
        [1.8, 2.1],
        [2.0, 1.9],
        [-1.4, -1.2],
        [-1.1, -1.3],
        [1.3, 1.1],
        [1.5, 1.2],
    ]
)

source_labels = np.array([0, 0, 1, 1])
domains = np.array([0, 0, 0, 0, 1, 1, 1, 1])
x_target = x[domains == 1]

clf = ARSVM()
clf.fit(x, source_labels, covariates=domains, target_covariate=1)
y_pred = clf.predict(x_target)

Train a Manifold-Regularized Classifier

LapSVM and LapRLS can use labeled source samples together with unlabeled target samples. The labels array may contain only the labeled source examples.

import numpy as np
from kalelinear.estimator import LapSVM

x_source = np.array([[-2.0, -1.8], [-1.8, -2.1], [1.9, 1.7], [2.1, 2.0]])
ys = np.array([0, 0, 1, 1])
x_target = np.array([[-1.4, -1.2], [-1.2, -1.1], [1.2, 1.1], [1.4, 1.3]])

x_train = np.vstack((x_source, x_target))

clf = LapSVM(kernel="linear")
clf.fit(x_train, ys)
y_pred = clf.predict(x_target)

Public API

from kalelinear.transformer import BDA, JDA, MIDA, MPCA, TCA
from kalelinear.estimator import ARRLS, ARSVM, CoIRLS, CoIRSVM, GSDA, LapRLS, LapSVM

Development

From the root of the repository, run the following commands in your terminal:

  1. Install pre-commit hooks (only required once):

    pre-commit install
  2. Run pre-commit checks for code style and formatting on all files:

    pre-commit run --all-files
  3. Run test cases to verify functionality:

    pytest
  4. Build the documentation:

    pip install -r docs/requirements.txt
    sphinx-build -b html docs/source docs/build/html

Related Projects

License

Kale-Linear is released under the MIT License. See LICENSE for details.

About

Non-deep knowledge-aware machine learning including domain adaptation in Python

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages