[WIP]for full-fine-tune-support by poryfly · Pull Request #3 · kvcache-ai/transformers

poryfly · 2026-05-22T09:09:32Z

What does this PR do?

support full finetune for KT

Fixes # (issue)

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

gemini-code-assist

Code Review

This pull request enhances the KT integration by adding configuration for training modes and full weight gradients, and updating the Trainer to account for KT-specific trainable parameters that are not standard nn.Module members. The changes include logic to log these parameters and inject them into the optimizer. Review feedback recommends preventing double-counting of parameters in logs, logging exceptions during parameter inspection instead of silent failure, and using unwrap_peft_model to correctly identify the training mode when PEFT is used.

gemini-code-assist · 2026-05-22T09:11:56Z

+                kt_trainable = get_kt_trainable_params(kt_model_for_count)
+                kt_trainable_numel = sum(p.numel() for p in kt_trainable) if kt_trainable else 0


The current logic for counting KT trainable parameters might lead to double counting if some parameters returned by get_kt_trainable_params are already present in model.parameters(). This is consistent with the check performed during parameter injection in _prepare_for_training. Filtering these out ensures the reported counts are accurate.

kt_trainable = get_kt_trainable_params(kt_model_for_count) model_param_ids = {id(p) for p in model.parameters()} kt_trainable = [p for p in kt_trainable if id(p) not in model_param_ids] if kt_trainable else [] kt_trainable_numel = sum(p.numel() for p in kt_trainable)

gemini-code-assist · 2026-05-22T09:11:56Z

+            except Exception:
+                pass  # Non-critical: best-effort count


Catching a broad Exception and silently passing can make it difficult to diagnose issues if the parameter counting logic fails unexpectedly. It is recommended to at least log the exception at a debug level to aid in troubleshooting.

Suggested change

except Exception:

pass # Non-critical: best-effort count

except Exception as e:

logger.debug(f"Could not count KT trainable parameters: {e}")

gemini-code-assist · 2026-05-22T09:11:56Z

+                        has_full_weight_grad = any(
+                            getattr(w, "_full_weight_grad", False)
+                            for w in getattr(kt_model, "_kt_wrappers", [])
+                        )


If kt_model is a PeftModel, the _kt_wrappers attribute might be located on the base model rather than the wrapper itself. Using unwrap_peft_model ensures that the check for _full_weight_grad is performed on the correct module level where these attributes are typically attached.

Suggested change

has_full_weight_grad = any(

getattr(w, "_full_weight_grad", False)

for w in getattr(kt_model, "_kt_wrappers", [])

)

has_full_weight_grad = any(

getattr(w, "_full_weight_grad", False)

for w in getattr(unwrap_peft_model(kt_model), "_kt_wrappers", [])

)

for full-fine-tune-support

0e29461

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]for full-fine-tune-support#3

[WIP]for full-fine-tune-support#3
poryfly wants to merge 1 commit into
kvcache-ai:sft-v5from
poryfly:full-fine-tune-support

poryfly commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		kt_trainable = get_kt_trainable_params(kt_model_for_count)
		kt_trainable_numel = sum(p.numel() for p in kt_trainable) if kt_trainable else 0

Conversation

poryfly commented May 22, 2026

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant