Skip to content

fix compaction kept-tail mapping after prior summaries#2646

Open
rumpl wants to merge 1 commit intodocker:mainfrom
rumpl:fix-compaction-kept-entry
Open

fix compaction kept-tail mapping after prior summaries#2646
rumpl wants to merge 1 commit intodocker:mainfrom
rumpl:fix-compaction-kept-entry

Conversation

@rumpl
Copy link
Copy Markdown
Member

@rumpl rumpl commented May 5, 2026

Problem

When a session already contained a summary, the kept-tail boundary returned by mapToSessionIndex was wrong. It counted non-system items from sess.Messages[0], but the filtered chat-message list it was indexing into came from sess.GetMessages(a), which:

  1. Prepends a synthetic "Session Summary: ..." user message that doesn't exist in sess.Messages at all.
  2. Starts iterating items at the prior summary's FirstKeptEntry — skipping every item before that offset.

Both effects shift the count, so the new FirstKeptEntry landed far too early in the session — typically inside the conversation that the prior summary already replaced. Subsequent compactions could then resurrect already-summarized history.

Fix

Replace nonSystemMessages + mapToSessionIndex with a single gatherCompactionInput that walks sess.Messages directly (mirroring session.buildSessionSummaryMessages), surfaces the synthetic Session Summary message when a prior summary exists, and returns a parallel slice of origin indices.

firstKeptSessionIndex maps a split index back to a sess.Messages index via that parallel slice — correct by construction regardless of prior summaries. The synthetic message maps back to the prior summary's index, so when the kept-tail boundary lands on it, the prior summary item is preserved across the new compaction.

ComputeFirstKeptEntry no longer needs the *agent.Agent parameter; updated its single call site in session_compaction.go.

Tests

Added regression coverage in pkg/runtime/compactor/compactor_test.go:

  • TestGatherCompactionInput_NoPriorSummary — replaces the old TestMapToSessionIndex, pinning the no-summary path.
  • TestGatherCompactionInput_WithPriorSummary — the regression test: with a summary at index 10 (FirstKeptEntry=8), a split keeping the last 2 messages now correctly maps to items[13], where the old code returned items[5].
  • TestGatherCompactionInput_PriorSummaryWithoutFirstKeptEntry — covers the "compact everything" prior-summary case where iteration must start strictly after the summary item.

@rumpl rumpl requested a review from a team as a code owner May 5, 2026 21:59
docker-agent[bot]

This comment was marked as outdated.

@rumpl rumpl force-pushed the fix-compaction-kept-entry branch from e9d6efc to 56b7b6e Compare May 5, 2026 22:19
@rumpl
Copy link
Copy Markdown
Member Author

rumpl commented May 5, 2026

/review

derekmisler
derekmisler previously approved these changes May 5, 2026
Comment thread pkg/session/migrations.go Outdated
aheritier

This comment was marked as resolved.

Comment thread pkg/session/session.go Outdated
Comment thread pkg/runtime/compactor/compactor.go Outdated
@rumpl rumpl force-pushed the fix-compaction-kept-entry branch from 56b7b6e to b3ab4bf Compare May 6, 2026 15:08
@rumpl
Copy link
Copy Markdown
Member Author

rumpl commented May 6, 2026

/review

@rumpl
Copy link
Copy Markdown
Member Author

rumpl commented May 6, 2026

@aheritier Completely changed the implementation, please take a look

Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🔴 CRITICAL

This PR correctly fixes the core kept-tail mapping bug. Two issues found in changed code: (1) Data race in gatherCompactionInput; (2) splitIdx==0 edge case drops prior kept-tail.

@docker-agent
Copy link
Copy Markdown

docker-agent Bot commented May 6, 2026

Assessment: 🔴 CRITICAL

This PR correctly identifies and fixes the core kept-tail mapping bug. The gatherCompactionInput + firstKeptSessionIndex approach is conceptually sound and the regression test coverage is well-structured. However, two bugs were found in the newly introduced code.


🔴 [HIGH] pkg/runtime/compactor/compactor.go:180 — Data race: gatherCompactionInput reads sess.Messages without holding sess.mu

func gatherCompactionInput(sess *session.Session) ([]chat.Message, []int) {
    items := sess.Messages   // ← direct access, no lock

items := sess.Messages reads the slice header without acquiring sess.mu. The session's mu sync.RWMutex protects Messages, and both ApplyCompaction and AddMessage acquire the write lock before appending:

func (s *Session) ApplyCompaction(...) {
    s.mu.Lock()
    s.Messages = append(s.Messages, item)  // concurrent writer
    s.mu.Unlock()
}

The previous code path went through sess.GetMessages(a)snapshotItems()s.mu.RLock(), which was race-safe. doCompact runs in a goroutine while the main loop can concurrently call AddMessage or ApplyCompaction. A concurrent append can reallocate the backing array, leaving the compactor iterating a stale or partially-written slice — a classic Go data race on a slice header.

Fix: expose snapshotItems as an exported method (e.g. SnapshotItems()) on *session.Session and call it here, mirroring what GetMessages / buildSessionSummaryMessages do internally.


🟡 [MEDIUM] pkg/runtime/compactor/compactor.go:205sessIndices[0] = lastSummaryIdx causes prior kept-tail messages to be silently dropped in subsequent session reconstruction

if lastSummaryIdx >= 0 {
    messages = append(messages, chat.Message{ /* synthetic summary */ })
    sessIndices = append(sessIndices, lastSummaryIdx)  // ← maps to summary item itself
}

When SplitIndexForKeep returns splitIdx = 0 (session is under the token budget — keep everything), firstKeptSessionIndex returns sessIndices[0] = lastSummaryIdx. This becomes the new FirstKeptEntry.

On the next call to buildSessionSummaryMessages with a new summary at index N:

  • kept = lastSummaryIdx, condition kept > 0 && kept < NstartIndex = lastSummaryIdx
  • Loop starts at items[lastSummaryIdx] — a non-message summary item (IsMessage() == false) → skipped
  • Iteration continues from lastSummaryIdx + 1, skipping the prior kept-tail (e.g. items[8..9] in TestGatherCompactionInput_WithPriorSummary)

The prior kept-tail messages are silently dropped from the reconstructed session on the next compaction cycle.

This is the test scenario from TestGatherCompactionInput_WithPriorSummary (items[0..7] compacted, items[8..9] kept-tail, items[10] = prior summary, items[11..14] post-compaction). If splitIdx = 0, FirstKeptEntry is set to 10 (the summary item), and items[8..9] are lost in the next reconstruction.

Fix: Map sessIndices[0] to kept (the prior FirstKeptEntry, i.e. the start of the prior kept-tail) when kept > 0, rather than to lastSummaryIdx. This ensures splitIdx = 0 correctly anchors at the beginning of the kept region, not the summary item.

When a session already contained a summary, the kept-tail boundary
returned by mapToSessionIndex was wrong. It counted non-system items
from sess.Messages[0], but the filtered chat-message list it was
indexing into came from sess.GetMessages(a), which prepends a
synthetic 'Session Summary: ...' user message and starts iterating
items at the prior summary's FirstKeptEntry. Both effects shifted the
count, so the new FirstKeptEntry landed far too early in the session
- typically inside the conversation the prior summary already
replaced.

Replace nonSystemMessages + mapToSessionIndex with a single
gatherCompactionInput that walks sess.Messages directly (mirroring
session.buildSessionSummaryMessages), surfaces the synthetic Session
Summary message when a prior summary exists, and returns a parallel
slice of origin indices. firstKeptSessionIndex maps a split index
back to a sess.Messages index via that parallel slice - correct by
construction regardless of prior summaries. The synthetic message
maps back to the prior summary's index so the prior summary item is
preserved across the new compaction when the boundary lands on it.

ComputeFirstKeptEntry no longer needs the *agent.Agent parameter;
updated its single call site in session_compaction.go.

Adds regression coverage in compactor_test.go for the prior-summary
case (with and without a non-zero FirstKeptEntry on the prior
summary), and replaces TestMapToSessionIndex with the equivalent
TestGatherCompactionInput_NoPriorSummary.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@rumpl rumpl force-pushed the fix-compaction-kept-entry branch from b3ab4bf to abc7caa Compare May 6, 2026 15:30
@rumpl
Copy link
Copy Markdown
Member Author

rumpl commented May 6, 2026

/review

Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: fix compaction kept-tail mapping after prior summaries

Assessment: 🟡 NEEDS ATTENTION

The core fix — replacing mapToSessionIndex + nonSystemMessages with gatherCompactionInput + CompactionInput() — is correct and well-tested. The parallel sessIndices slice approach cleanly solves the off-by-one shift caused by the synthetic "Session Summary" message and the FirstKeptEntry start offset.

⚠️ Note: Automatic verification was inconclusive (upstream API overloaded). The following finding is from draft analysis and has not been independently confirmed. The author should evaluate it.

@docker-agent
Copy link
Copy Markdown

docker-agent Bot commented May 6, 2026

🤖 Code Review — pkg/runtime/compactor/compactor.go

[MEDIUM] firstKeptSessionIndex reads sess.Messages without holding sess.mu — potential data race

In the new firstKeptSessionIndex function (line 247–252), the sentinel branch reads len(sess.Messages) without holding the session's mutex:

func firstKeptSessionIndex(sess *session.Session, sessIndices []int, splitIdx int) int {
    if splitIdx >= len(sessIndices) {
        return len(sess.Messages)  // ← reads slice header outside mu
    }
    return sessIndices[splitIdx]
}

AddMessage and ApplyCompaction both mutate sess.Messages (appending, potentially reallocating the backing array) under sess.mu.Lock(). Reading len(sess.Messages) — which reads the len field of the slice header — concurrently without the lock is a data race under Go's memory model, detectable by go test -race.

The new race test TestCompactionInputConcurrent covers CompactionInput() itself (which uses snapshotItems()s.mu.RLock()), but does not exercise the path through firstKeptSessionIndex where sess.Messages is read unlocked.

This branch is only reached when splitIdx >= len(sessIndices) — the "compact everything, keep nothing" sentinel — which happens for very short or empty sessions. It is unlikely to be triggered in production compaction, but the race is real and could fire under -race if exercised.

Suggested fix: pass the snapshot length from CompactionInput() so the sentinel can be computed without touching the live sess.Messages:

// In CompactionInput(), return snapshotLen alongside messages and sessIndices:
return messages, sessIndices, len(items)

// firstKeptSessionIndex uses snapshotLen instead of sess.Messages:
func firstKeptSessionIndex(snapshotLen int, sessIndices []int, splitIdx int) int {
    if splitIdx >= len(sessIndices) {
        return snapshotLen
    }
    return sessIndices[splitIdx]
}

Since len(items) (the snapshot) equals len(sess.Messages) at snapshot time and sess.Messages only grows, this preserves semantics while eliminating the unsynchronised read.


Note: Automatic verification was inconclusive (upstream API overloaded at review time). This finding is from draft analysis — please evaluate accordingly.

Copy link
Copy Markdown

@docker-agent docker-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The fix correctly addresses the FirstKeptEntry mapping bug when prior summaries exist. The new gatherCompactionInput + firstKeptSessionIndex approach walks sess.Messages directly with a parallel origin-index slice, which is correct by construction regardless of prior summaries. Concurrency model (RLock in snapshotItems) is sound. Regression tests accurately cover the reported bug.

No high or medium severity issues were found in the changed code.

Low-severity observations (no action required)
  • pkg/session/session.go: Item.FirstKeptEntry uses omitempty in JSON serialization, making the value 0 indistinguishable from "unset". The struct comment mentions -1 as a sentinel but -1 is never used in practice — the actual sentinel is 0. This is a documentation inconsistency and a future maintenance hazard, not an active bug.
  • pkg/session/session.go: sessIndices is intentionally non-monotonic when a prior summary exists (the synthetic summary message maps to a higher index than the kept-tail entries that follow it). The current code handles this correctly, but future callers should not assume the slice is sorted.
  • pkg/runtime/compactor/compactor.go: extractMessages retains a _ *agent.Agent dead parameter. No functional impact since the function is unexported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants