Skip to content

[Feature] Support user-selectable LoRA adapter files at load/runtime #37

Description

@dlwlzzero

Background

In Quick.AI, LoRA adapters are currently tied to the static values declared in
each model directory's nntr_config.json (lora_rank, lora_alpha,
lora_target):

// res/<family>/<variant>/nntr_config.json
{
  "lora_rank": 0,
  "lora_alpha": 0,
  "lora_target": []
}

This means a model directory is pinned to a single LoRA configuration, and
there is no path for a user to choose and load a different LoRA adapter on top
of the same base model.

Problem

  1. No runtime selection — Even when a user has multiple LoRA adapters
    (e.g., per-domain or per-language fine-tunes), they must hand-edit
    nntr_config.json or clone the entire model directory for each one.
  2. No API / CLI support — The public loadModel(BackendType, ModelType, ModelQuantizationType) entry point (api/causal_lm_api.h) and the
    quick_dot_ai_run runner (main.cpp) do not accept a LoRA file path.
  3. Limited flexibility — On-device, the desirable pattern of sharing the
    base-model weights while swapping only the small LoRA adapter (saving
    memory/storage) is not supported.

Proposal

Add a path for users to load a LoRA adapter file of their own choosing.

(1) CLI level

./build/quick_dot_ai_run ./res/qwen3/qwen3-4b/ --lora ./res/qwen3/loras/my_adapter.bin

(2) API level (ABI-aware)

  • The existing loadModel signature in api/causal_lm_api.h is frozen, so
    it must not change. Instead, append a new entry point that takes a LoRA
    path/scale.
    • e.g., loadModelWithLora(...) or a separate
      setLoraAdapter(const char *path, float scale).

(3) Configuration precedence

  • An explicitly provided LoRA argument overrides the static LoRA config in
    nntr_config.json.
  • When no LoRA is specified, behavior remains 100% backward compatible.

Scope (draft)

  • Add LoRA path/rank/alpha/target fields to ModelRuntimeConfig
    (api/model_config_internal.h) and parse them.
  • Implement LoRA-adapter application in the
    Transformer::setupParameters() / CausalLM loading path.
  • Add --lora (and, if needed, --lora-scale) CLI arguments to
    main.cpp.
  • Append a LoRA entry point to the public API (call out the ABI impact in
    the PR description).
  • Confirm and wire up LoRA-application support in the NNTrainer subproject.
  • Update docs: api/README.md, models/README.md, README.md.

Open Questions

  1. Application strategy — Should the adapter be merged into the base
    weights at load time, or kept as a separate adapter path applied at
    runtime?
  2. Multiple / hot-swap — Should loading multiple LoRA adapters at once, or
    dynamic hot-swapping, be part of the first iteration?
  3. FSU interaction — How does adapter loading interact with the Flash
    Storage Utilization (FSU) streaming path?
  4. Adapter format — Support only the NNTrainer .bin format, or include a
    conversion tool for HF safetensors adapters?

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions