Skip to content

[WIP]for full-fine-tune-support#3

Open
poryfly wants to merge 1 commit into
kvcache-ai:sft-v5from
poryfly:full-fine-tune-support
Open

[WIP]for full-fine-tune-support#3
poryfly wants to merge 1 commit into
kvcache-ai:sft-v5from
poryfly:full-fine-tune-support

Conversation

@poryfly
Copy link
Copy Markdown

@poryfly poryfly commented May 22, 2026

What does this PR do?

support full finetune for KT

Fixes # (issue)

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the KT integration by adding configuration for training modes and full weight gradients, and updating the Trainer to account for KT-specific trainable parameters that are not standard nn.Module members. The changes include logic to log these parameters and inject them into the optimizer. Review feedback recommends preventing double-counting of parameters in logs, logging exceptions during parameter inspection instead of silent failure, and using unwrap_peft_model to correctly identify the training mode when PEFT is used.

Comment on lines +1527 to +1528
kt_trainable = get_kt_trainable_params(kt_model_for_count)
kt_trainable_numel = sum(p.numel() for p in kt_trainable) if kt_trainable else 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for counting KT trainable parameters might lead to double counting if some parameters returned by get_kt_trainable_params are already present in model.parameters(). This is consistent with the check performed during parameter injection in _prepare_for_training. Filtering these out ensures the reported counts are accurate.

                kt_trainable = get_kt_trainable_params(kt_model_for_count)
                model_param_ids = {id(p) for p in model.parameters()}
                kt_trainable = [p for p in kt_trainable if id(p) not in model_param_ids] if kt_trainable else []
                kt_trainable_numel = sum(p.numel() for p in kt_trainable)

Comment on lines +1533 to +1534
except Exception:
pass # Non-critical: best-effort count
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception and silently passing can make it difficult to diagnose issues if the parameter counting logic fails unexpectedly. It is recommended to at least log the exception at a debug level to aid in troubleshooting.

Suggested change
except Exception:
pass # Non-critical: best-effort count
except Exception as e:
logger.debug(f"Could not count KT trainable parameters: {e}")

Comment on lines +1715 to +1718
has_full_weight_grad = any(
getattr(w, "_full_weight_grad", False)
for w in getattr(kt_model, "_kt_wrappers", [])
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If kt_model is a PeftModel, the _kt_wrappers attribute might be located on the base model rather than the wrapper itself. Using unwrap_peft_model ensures that the check for _full_weight_grad is performed on the correct module level where these attributes are typically attached.

Suggested change
has_full_weight_grad = any(
getattr(w, "_full_weight_grad", False)
for w in getattr(kt_model, "_kt_wrappers", [])
)
has_full_weight_grad = any(
getattr(w, "_full_weight_grad", False)
for w in getattr(unwrap_peft_model(kt_model), "_kt_wrappers", [])
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant