[WIP]for full-fine-tune-support#3
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the KT integration by adding configuration for training modes and full weight gradients, and updating the Trainer to account for KT-specific trainable parameters that are not standard nn.Module members. The changes include logic to log these parameters and inject them into the optimizer. Review feedback recommends preventing double-counting of parameters in logs, logging exceptions during parameter inspection instead of silent failure, and using unwrap_peft_model to correctly identify the training mode when PEFT is used.
| kt_trainable = get_kt_trainable_params(kt_model_for_count) | ||
| kt_trainable_numel = sum(p.numel() for p in kt_trainable) if kt_trainable else 0 |
There was a problem hiding this comment.
The current logic for counting KT trainable parameters might lead to double counting if some parameters returned by get_kt_trainable_params are already present in model.parameters(). This is consistent with the check performed during parameter injection in _prepare_for_training. Filtering these out ensures the reported counts are accurate.
kt_trainable = get_kt_trainable_params(kt_model_for_count)
model_param_ids = {id(p) for p in model.parameters()}
kt_trainable = [p for p in kt_trainable if id(p) not in model_param_ids] if kt_trainable else []
kt_trainable_numel = sum(p.numel() for p in kt_trainable)| except Exception: | ||
| pass # Non-critical: best-effort count |
There was a problem hiding this comment.
Catching a broad Exception and silently passing can make it difficult to diagnose issues if the parameter counting logic fails unexpectedly. It is recommended to at least log the exception at a debug level to aid in troubleshooting.
| except Exception: | |
| pass # Non-critical: best-effort count | |
| except Exception as e: | |
| logger.debug(f"Could not count KT trainable parameters: {e}") |
| has_full_weight_grad = any( | ||
| getattr(w, "_full_weight_grad", False) | ||
| for w in getattr(kt_model, "_kt_wrappers", []) | ||
| ) |
There was a problem hiding this comment.
If kt_model is a PeftModel, the _kt_wrappers attribute might be located on the base model rather than the wrapper itself. Using unwrap_peft_model ensures that the check for _full_weight_grad is performed on the correct module level where these attributes are typically attached.
| has_full_weight_grad = any( | |
| getattr(w, "_full_weight_grad", False) | |
| for w in getattr(kt_model, "_kt_wrappers", []) | |
| ) | |
| has_full_weight_grad = any( | |
| getattr(w, "_full_weight_grad", False) | |
| for w in getattr(unwrap_peft_model(kt_model), "_kt_wrappers", []) | |
| ) |
What does this PR do?
support full finetune for KT
Fixes # (issue)
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.