feat(agent,timeline): subagent factory pattern + per-session timeline reload#12
Open
ngoclam9415 wants to merge 5 commits into
Open
feat(agent,timeline): subagent factory pattern + per-session timeline reload#12ngoclam9415 wants to merge 5 commits into
ngoclam9415 wants to merge 5 commits into
Conversation
… reload TaskResource dispatches to agent factories instead of shared instances. Each Task call constructs a fresh agent with a disjoint object graph (timeline, star loop count, session id, EventLog), eliminating both the concurrent-same-type state corruption and the sequential timeline- accumulation bug. Legacy instance registration still works but warns. set_session_id is now a real context boundary: it flushes the outgoing session, rebuilds the timeline via the extracted _build_timeline() (which resets all session-scoped state including compaction-tracking fields), and rehydrates from disk via the new CompressedTimeline.rehydrate() wrapper over read_since. Unknown session yields an empty timeline; same id is a no-op. - task_resource.py: factory registration, _descriptions cache, per-spawn construction, notifiable propagation at spawn; drop _get_agent_tools - star_agent.py: extract _build_timeline(); rewrite set_session_id - timeline_serializer.py: add CompressedTimeline.rehydrate() - dana_coding_agent.py: register explore subagent as functools.partial - tests: 9 TaskResource + 6 set_session_id unit tests
Previously a sub-agent whose aquery() raised left its session stuck at status='running', so task_output reported it running indefinitely. task() now catches the exception, records status='failed' plus the error, and re-raises so the caller still observes the failure. task_output surfaces the stored error for failed sessions. Addresses code-review finding N3.
The persisted config flag sat one line above _timeline with a near- identical name, reading as if it held a timeline. Rename to match CompressedTimeline.compression_enabled. Constructor param unchanged.
…reload set_session_id gains reload_timeline (default True). When False it is a pure relabel — the current in-memory timeline is kept and carried into the new session id instead of being flushed, rebuilt, and rehydrated. This lets a subclass that seeds its own timeline (e.g. a persona entry before the STAR loop) keep that timeline across a session switch. Gating only the rehydrate() call is insufficient — the _build_timeline() rebuild also discards the caller's timeline — so the flag gates the whole reload. Threaded through aquery, aquery_stream, and aconverse (-> Communicator. aconverse -> aquery). Default True everywhere preserves the session-boundary contract for subagents and existing callers; opt out with reload_timeline= False. Directed @agent messages keep the default.
…o True Flip the reload_timeline default to False across set_session_id, aquery, aquery_stream, and aconverse. Most callers — including subclasses that seed their own timeline before the STAR loop — manage their own context, so a pure relabel is the safer default. TaskResource.task() now passes reload_timeline=True explicitly: a sub-agent's session_id is a hard context boundary, so each spawn gets a disjoint, disk-accurate timeline (fresh for a new session, rehydrated on resume). Tests exercising the reload path updated to pass reload_timeline=True.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two coupled correctness bugs in Dana subagent dispatch
(plan:
260519-0848-subagent-factory-and-timeline-resume).Phase 1 — Shared-instance corruption.
TaskResourceheld one long-livedagent instance per
subagent_type. Concurrent same-typeTaskcalls (alreadypossible — tool batches run through
asyncio.gather) interleaved on sharedmutable state; sequential calls accumulated into one ever-growing timeline.
TaskResourcenow dispatches to agent factories (functools.partial) —every spawn constructs a fresh agent with a disjoint object graph (timeline,
star loop count, session id, EventLog).
Phase 2 —
session_idis now a real context boundary.aquery(session_id=...)previously only relabeled the save target — it never reloaded or reset the
in-memory timeline, so resume across restart silently lost data and "fresh"
sessions inherited prior context.
set_session_idnow flushes the outgoingsession, rebuilds the timeline via the extracted
_build_timeline()(resettingALL session-scoped state, including compaction-tracking fields), and rehydrates
from disk via the new
CompressedTimeline.rehydrate()wrapper overread_since.Changes
task_resource.py— factory registration +.funccontract validation,_descriptionscache, per-spawn construction, notifiable propagation atspawn;
_get_agent_tools()dropped. Failedaquerymarks the session"failed"instead of leaving it"running"forever.star_agent.py— extracted_build_timeline()(single source of truth);rewrote
set_session_id(fast-path → flush → assign id → rebuild → rehydrate).timeline_serializer.py— addedCompressedTimeline.rehydrate().dana_coding_agent.py— explore subagent registered asfunctools.partial.Legacy
register_agent(name, instance)still works (wrapped as a constantfactory) but emits
DeprecationWarning— migration is incremental.Testing
test_task_resource.py, 6test_set_session_id.py)test_llm_providers.pyfailures); regression suite incl. reasoning-replay green
code-reviewer: 9/10, no critical issues — all findings (N1/N2/N3) resolvedNotes
TaskResourceare one-line tree nodes).