Background
In Quick.AI, LoRA adapters are currently tied to the static values declared in
each model directory's nntr_config.json (lora_rank, lora_alpha,
lora_target):
This means a model directory is pinned to a single LoRA configuration, and
there is no path for a user to choose and load a different LoRA adapter on top
of the same base model.
Problem
- No runtime selection — Even when a user has multiple LoRA adapters
(e.g., per-domain or per-language fine-tunes), they must hand-edit
nntr_config.json or clone the entire model directory for each one.
- No API / CLI support — The public
loadModel(BackendType, ModelType, ModelQuantizationType) entry point (api/causal_lm_api.h) and the
quick_dot_ai_run runner (main.cpp) do not accept a LoRA file path.
- Limited flexibility — On-device, the desirable pattern of sharing the
base-model weights while swapping only the small LoRA adapter (saving
memory/storage) is not supported.
Proposal
Add a path for users to load a LoRA adapter file of their own choosing.
(1) CLI level
./build/quick_dot_ai_run ./res/qwen3/qwen3-4b/ --lora ./res/qwen3/loras/my_adapter.bin
(2) API level (ABI-aware)
- The existing
loadModel signature in api/causal_lm_api.h is frozen, so
it must not change. Instead, append a new entry point that takes a LoRA
path/scale.
- e.g.,
loadModelWithLora(...) or a separate
setLoraAdapter(const char *path, float scale).
(3) Configuration precedence
- An explicitly provided LoRA argument overrides the static LoRA config in
nntr_config.json.
- When no LoRA is specified, behavior remains 100% backward compatible.
Scope (draft)
Open Questions
- Application strategy — Should the adapter be merged into the base
weights at load time, or kept as a separate adapter path applied at
runtime?
- Multiple / hot-swap — Should loading multiple LoRA adapters at once, or
dynamic hot-swapping, be part of the first iteration?
- FSU interaction — How does adapter loading interact with the Flash
Storage Utilization (FSU) streaming path?
- Adapter format — Support only the NNTrainer
.bin format, or include a
conversion tool for HF safetensors adapters?
Background
In Quick.AI, LoRA adapters are currently tied to the static values declared in
each model directory's
nntr_config.json(lora_rank,lora_alpha,lora_target):This means a model directory is pinned to a single LoRA configuration, and
there is no path for a user to choose and load a different LoRA adapter on top
of the same base model.
Problem
(e.g., per-domain or per-language fine-tunes), they must hand-edit
nntr_config.jsonor clone the entire model directory for each one.loadModel(BackendType, ModelType, ModelQuantizationType)entry point (api/causal_lm_api.h) and thequick_dot_ai_runrunner (main.cpp) do not accept a LoRA file path.base-model weights while swapping only the small LoRA adapter (saving
memory/storage) is not supported.
Proposal
Add a path for users to load a LoRA adapter file of their own choosing.
(1) CLI level
(2) API level (ABI-aware)
loadModelsignature inapi/causal_lm_api.his frozen, soit must not change. Instead, append a new entry point that takes a LoRA
path/scale.
loadModelWithLora(...)or a separatesetLoraAdapter(const char *path, float scale).(3) Configuration precedence
nntr_config.json.Scope (draft)
ModelRuntimeConfig(
api/model_config_internal.h) and parse them.Transformer::setupParameters()/CausalLMloading path.--lora(and, if needed,--lora-scale) CLI arguments tomain.cpp.the PR description).
api/README.md,models/README.md,README.md.Open Questions
weights at load time, or kept as a separate adapter path applied at
runtime?
dynamic hot-swapping, be part of the first iteration?
Storage Utilization (FSU) streaming path?
.binformat, or include aconversion tool for HF safetensors adapters?