[Feature] Support user-selectable LoRA adapter files at load/runtime

## Background

In Quick.AI, LoRA adapters are currently tied to the static values declared in
each model directory's `nntr_config.json` (`lora_rank`, `lora_alpha`,
`lora_target`):

```jsonc
// res/<family>/<variant>/nntr_config.json
{
  "lora_rank": 0,
  "lora_alpha": 0,
  "lora_target": []
}
```

This means **a model directory is pinned to a single LoRA configuration**, and
there is no path for a user to choose and load a different LoRA adapter on top
of the same base model.

## Problem

1. **No runtime selection** — Even when a user has multiple LoRA adapters
   (e.g., per-domain or per-language fine-tunes), they must hand-edit
   `nntr_config.json` or clone the entire model directory for each one.
2. **No API / CLI support** — The public `loadModel(BackendType, ModelType,
   ModelQuantizationType)` entry point (`api/causal_lm_api.h`) and the
   `quick_dot_ai_run` runner (`main.cpp`) do not accept a LoRA file path.
3. **Limited flexibility** — On-device, the desirable pattern of sharing the
   base-model weights while swapping only the small LoRA adapter (saving
   memory/storage) is not supported.

## Proposal

Add a path for users to load a LoRA adapter file of their own choosing.

**(1) CLI level**

```bash
./build/quick_dot_ai_run ./res/qwen3/qwen3-4b/ --lora ./res/qwen3/loras/my_adapter.bin
```

**(2) API level** (ABI-aware)

- The existing `loadModel` signature in `api/causal_lm_api.h` is **frozen**, so
  it must not change. Instead, **append** a new entry point that takes a LoRA
  path/scale.
  - e.g., `loadModelWithLora(...)` or a separate
    `setLoraAdapter(const char *path, float scale)`.

**(3) Configuration precedence**

- An explicitly provided LoRA argument overrides the static LoRA config in
  `nntr_config.json`.
- When no LoRA is specified, behavior remains 100% backward compatible.

## Scope (draft)

- [ ] Add LoRA path/rank/alpha/target fields to `ModelRuntimeConfig`
      (`api/model_config_internal.h`) and parse them.
- [ ] Implement LoRA-adapter application in the
      `Transformer::setupParameters()` / `CausalLM` loading path.
- [ ] Add `--lora` (and, if needed, `--lora-scale`) CLI arguments to
      `main.cpp`.
- [ ] Append a LoRA entry point to the public API (call out the ABI impact in
      the PR description).
- [ ] Confirm and wire up LoRA-application support in the NNTrainer subproject.
- [ ] Update docs: `api/README.md`, `models/README.md`, `README.md`.

## Open Questions

1. **Application strategy** — Should the adapter be **merged** into the base
   weights at load time, or kept as a **separate adapter path** applied at
   runtime?
2. **Multiple / hot-swap** — Should loading multiple LoRA adapters at once, or
   dynamic hot-swapping, be part of the first iteration?
3. **FSU interaction** — How does adapter loading interact with the Flash
   Storage Utilization (FSU) streaming path?
4. **Adapter format** — Support only the NNTrainer `.bin` format, or include a
   conversion tool for HF safetensors adapters?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support user-selectable LoRA adapter files at load/runtime #37

Background

Problem

Proposal

Scope (draft)

Open Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Support user-selectable LoRA adapter files at load/runtime #37

Description

Background

Problem

Proposal

Scope (draft)

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions