Skip to content

Add ModelParameters.enableSwaFull() for SWA KV cache control#275

Merged
bernardladenthin merged 1 commit into
mainfrom
claude/swa-full-cache-binding-dmqw5h
Jun 28, 2026
Merged

Add ModelParameters.enableSwaFull() for SWA KV cache control#275
bernardladenthin merged 1 commit into
mainfrom
claude/swa-full-cache-binding-dmqw5h

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

  • Add ModelParameters.enableSwaFull() method to enable the --swa-full flag, which keeps the full-size sliding-window-attention (SWA) KV cache to enable cross-request prompt-prefix reuse (pairs with --cache-reuse).
  • Add ModelFlag.SWA_FULL enum entry with corresponding CLI flag and documentation.
  • Update unit tests and enum count to reflect the new flag.

Test plan

  • Added unit tests in ModelParametersExtendedTest covering enableSwaFull() and default behavior
  • Added unit test in ModelFlagTest to verify the CLI flag mapping
  • Updated enum count assertion in ModelFlagTest from 34 to 35
  • CHANGELOG updated with the new feature

Related issues / PRs

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01SRXbL5RqW3B1XZRea3Rfc7

Add a valueless boolean model launch flag mirroring the existing
enableFlashAttn() / ModelFlag.FLASH_ATTN pattern. Default off.

--swa-full keeps the full-size sliding-window-attention KV cache so the
SWA layers' KV becomes reusable across requests, restoring cross-request
prompt-prefix cache reuse (pairs with setCacheReuse) at ~2x the SWA-layer
KV RAM. Beneficial only for multi-request sessions sharing a prompt
prefix, so it is opt-in.

- ModelFlag.SWA_FULL("--swa-full")
- ModelParameters.enableSwaFull()
- ModelFlagTest: enum->string mapping row + enum count 34->35
- ModelParametersExtendedTest: enableSwaFull + not-by-default tests
- CHANGELOG entry

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SRXbL5RqW3B1XZRea3Rfc7
@bernardladenthin bernardladenthin merged commit c10a592 into main Jun 28, 2026
10 of 42 checks passed
@bernardladenthin bernardladenthin deleted the claude/swa-full-cache-binding-dmqw5h branch June 28, 2026 12:22
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants