Skip to content

fix: gateway session data loss and restore deleted _delivery module#2102

Merged
MervinPraison merged 2 commits into
mainfrom
cursor/critical-bug-investigation-7d5f
Jun 22, 2026
Merged

fix: gateway session data loss and restore deleted _delivery module#2102
MervinPraison merged 2 commits into
mainfrom
cursor/critical-bug-investigation-7d5f

Conversation

@cursor

@cursor cursor Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Critical bug scan found three high-severity regressions in recent commits. This PR applies minimal fixes with regression tests.

Bugs fixed

1. Gateway session resume drops conversation history (data loss)

Impact: With session persistence enabled, every disconnect → reconnect cycle restored an empty session instead of the latest conversation state.

Root cause: create_session() wrote an initial empty session_data system message on create and a full snapshot on close. Resume logic used break on the first matching system message, always selecting the empty initial snapshot.

Fix: Iterate all system messages and keep the latest session_data snapshot.

2. Gateway stop() loses in-flight sessions (data loss)

Impact: Graceful shutdown (SIGTERM / stop()) closed sessions in memory without persisting them, unlike WebSocket disconnect which calls close_session().

Root cause: stop() called session.close() directly instead of close_session(session_id).

Fix: Call close_session() for each active session during shutdown.

3. Missing bots/_delivery.py causes import crashes

Impact: from praisonai.bots import DurableDelivery, DurableAdapterMixin, or deliver_chunked raised ModuleNotFoundError.

Root cause: PR #2054 removed _delivery.py as "dead code", but _durable_adapter.py and bots/__init__.py still import from it.

Fix: Restore _delivery.py from pre-removal history.

Validation

  • Added tests/unit/gateway/test_gateway_persistence_fixes.py
  • All 4 new regression tests pass
Open in Web View Automation 

Summary by CodeRabbit

  • New Features

    • Enhanced message delivery system with streaming, chunking, and adaptive platform support.
    • Durable message persistence with automatic retry, recovery, and idempotency key support.
  • Bug Fixes

    • Fixed session recovery to use the latest persisted snapshot.
    • Improved session shutdown to properly persist active sessions through the correct close flow.
  • Tests

    • Added tests verifying session persistence and durable delivery functionality.

- Resume gateway sessions from the latest session_data snapshot instead of
  the initial empty one written at session creation
- Persist active sessions on gateway stop() via close_session()
- Restore bots/_delivery.py accidentally removed in #2054; imports for
  DurableDelivery and DurableAdapterMixin were still live
- Add regression tests for resume, shutdown persist, and delivery imports

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@MervinPraison

Copy link
Copy Markdown
Owner

@coderabbitai review

@MervinPraison

Copy link
Copy Markdown
Owner

/review

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b7b64a1-b0f6-4ba7-aeb1-8f0dae06bb89

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new _delivery.py bot module implementing MessageSender, UnifiedDelivery (streaming/chunking with platform capability detection), deliver_with_retry (exponential backoff with abort signal), deliver_chunked, and DurableDelivery (outbox-backed idempotent delivery). Fixes two WebSocketGateway session persistence bugs in stop() and snapshot rehydration, with accompanying regression tests.

Changes

Durable Bot Delivery Module

Layer / File(s) Summary
MessageSender protocol and UnifiedDelivery
src/praisonai/praisonai/bots/_delivery.py (L1–255, L557–564)
Defines MessageSender protocol and UnifiedDelivery with platform-capability caching, optional typing indicator, streaming progressive edits with rate limiting, fallback chunking, and __all__ exports.
deliver_with_retry and deliver_chunked helpers
src/praisonai/praisonai/bots/_delivery.py (L257–387)
Implements create_delivery, bounded exponential backoff with recoverable/permanent error classification, abort-signal sleep, and deliver_chunked applying reply_to only on the first chunk.
DurableDelivery with outbox and drain
src/praisonai/praisonai/bots/_delivery.py (L389–555)
Implements DurableDelivery with optional outbox enqueuing, auto-generated idempotency keys, sent/failed status updates, and drain_pending for at-most-once retry per cycle.

Gateway Session Persistence Fixes and Tests

Layer / File(s) Summary
Gateway stop and rehydration bug fixes
src/praisonai/praisonai/gateway/server.py
stop() now calls close_session(session_id) per session; rehydration overwrites session_data on each matching system message rather than stopping at the first match.
Regression tests
src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py
Adds _make_gateway helper and three test classes covering latest-snapshot resume, stop-time persistence, and DurableDelivery/deliver_chunked/DurableAdapterMixin import smoke tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • MervinPraison/PraisonAI#2036: Directly overlaps this PR's delivery utility layer — both implement MessageSender, deliver_with_retry, deliver_chunked, and DurableDelivery in praisonai/bots/_delivery.py.
  • MervinPraison/PraisonAI#2054: Directly conflicts with this PR — it removes the same _delivery.py module that this PR introduces.

Suggested reviewers

  • MervinPraison

🐇 A rabbit hops to send your words,
Through streams and chunks and retried birds,
The outbox holds each message tight,
Till sessions wake and snapshots write —
No message lost, no session stray,
Durable delivery saves the day! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: fixing gateway session data loss and restoring the deleted _delivery module.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/critical-bug-investigation-7d5f

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@MervinPraison

Copy link
Copy Markdown
Owner

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

  1. Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
  2. Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
  3. Performance: Any module-level heavy imports? Hot-path regressions?
  4. Tests: Are tests included? Do they cover the changes adequately?
  5. Backward compat: Any public API changes without deprecation?
  6. Code quality: DRY violations, naming conventions, error handling?
  7. Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
  8. Suggest specific improvements with code examples where possible

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/praisonai/praisonai/bots/_delivery.py (1)

247-254: ⚡ Quick win

Rate limiter not using per-channel limiting.

The RateLimiter.acquire() method supports an optional channel_id parameter for per-channel throttling, but _acquire_rate_limit doesn't pass it. This reduces effectiveness—multiple sends to the same channel won't respect per-channel delays documented in the rate limiter.

♻️ Suggested fix
-    async def _acquire_rate_limit(self) -> None:
-        """Acquire rate limit token if platform needs it."""
+    async def _acquire_rate_limit(self, channel_id: Optional[str] = None) -> None:
+        """Acquire rate limit token if platform needs it.
+        
+        Args:
+            channel_id: Optional channel ID for per-channel limiting
+        """
         if self._rate_limiter is None:
             # Use platform-specific rate limiter
             self._rate_limiter = RateLimiter.for_platform(self.bot.platform)
         
         if self._rate_limiter:
-            await self._rate_limiter.acquire()
+            await self._rate_limiter.acquire(channel_id)

Then update call sites to pass channel_id:

-                await self._acquire_rate_limit()
+                await self._acquire_rate_limit(channel_id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai/praisonai/bots/_delivery.py` around lines 247 - 254, The
_acquire_rate_limit method is not leveraging the per-channel throttling
capability of the rate limiter. Modify the _acquire_rate_limit method to accept
a channel_id parameter and pass it to the self._rate_limiter.acquire() call as
an optional argument. Then update all call sites that invoke _acquire_rate_limit
to provide the appropriate channel_id value so that the rate limiter can
properly enforce per-channel delays as documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/praisonai/praisonai/bots/_delivery.py`:
- Around line 182-191: The message editing logic performs a redundant final edit
when the streaming loop completes successfully with the full text content. To
fix this, introduce a boolean flag (such as is_complete) that tracks whether the
streaming loop in the edit_message flow successfully sent all content (when
next_pos >= len(text) evaluates to true). Then, before the final edit operation
that occurs around lines 220-230 (after the main streaming loop ends), check
this flag and only proceed with the edit if the content was not already fully
sent during the loop. This prevents wasting an API call when the complete text
has already been delivered to the Discord message.
- Around line 87-89: The fire-and-forget task created by
asyncio.create_task(self._show_typing(channel_id)) lacks a stored reference,
risking garbage collection before completion and swallowing exceptions. Store
the task reference returned by asyncio.create_task in a task set or collection
maintained by the class to track background tasks, ensuring proper lifecycle
management and debugging capability. Make sure to add appropriate cleanup logic
to remove completed tasks from the tracking collection to prevent memory leaks.

In `@src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py`:
- Around line 49-72: The test_stop_persists_active_sessions test calls await
gw.stop() while _is_running is False, causing the stop method to return early
without executing session close persistence logic. Additionally, the test reuses
the same gateway instance after stop, so in-memory state masks whether
persistence actually worked. Fix this by ensuring the gateway is marked as
running before calling stop (set _is_running to True), and then create a
completely new gateway instance that loads from the persisted session store to
verify the data was truly persisted and not just held in memory.

---

Nitpick comments:
In `@src/praisonai/praisonai/bots/_delivery.py`:
- Around line 247-254: The _acquire_rate_limit method is not leveraging the
per-channel throttling capability of the rate limiter. Modify the
_acquire_rate_limit method to accept a channel_id parameter and pass it to the
self._rate_limiter.acquire() call as an optional argument. Then update all call
sites that invoke _acquire_rate_limit to provide the appropriate channel_id
value so that the rate limiter can properly enforce per-channel delays as
documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5838afd3-8488-4fb2-b364-b9b1237bb6cc

📥 Commits

Reviewing files that changed from the base of the PR and between 5f39d66 and 1d65bf3.

📒 Files selected for processing (3)
  • src/praisonai/praisonai/bots/_delivery.py
  • src/praisonai/praisonai/gateway/server.py
  • src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py

Comment on lines +87 to +89
# Show typing indicator if supported
if typing and caps.supports_typing:
asyncio.create_task(self._show_typing(channel_id))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Store a reference to the typing indicator task.

The fire-and-forget task created here has no stored reference, which risks garbage collection before completion and swallows any exceptions. While typing indicator failure is non-critical, storing the reference enables proper lifecycle management and debugging.

🛠️ Suggested fix

Add a task set to track background tasks and ensure proper cleanup:

 class UnifiedDelivery:
     def __init__(self, bot: "BotProtocol"):
         self.bot = bot
         self._capabilities: Optional["PlatformCapabilities"] = None
         self._rate_limiter: Optional[RateLimiter] = None
         self._streamer: Optional[DraftStreamer] = None
+        self._background_tasks: set[asyncio.Task] = set()
         # Show typing indicator if supported
         if typing and caps.supports_typing:
-            asyncio.create_task(self._show_typing(channel_id))
+            task = asyncio.create_task(self._show_typing(channel_id))
+            self._background_tasks.add(task)
+            task.add_done_callback(self._background_tasks.discard)
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 89-89: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai/praisonai/bots/_delivery.py` around lines 87 - 89, The
fire-and-forget task created by
asyncio.create_task(self._show_typing(channel_id)) lacks a stored reference,
risking garbage collection before completion and swallowing exceptions. Store
the task reference returned by asyncio.create_task in a task set or collection
maintained by the class to track background tasks, ensuring proper lifecycle
management and debugging capability. Make sure to add appropriate cleanup logic
to remove completed tasks from the tracking collection to prevent memory leaks.

Source: Linters/SAST tools

Comment on lines +182 to +191
await self.bot.edit_message(
channel_id,
message.message_id,
display_text,
)

if next_pos >= len(text):
break # Complete text has been sent

current_pos = next_pos

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Redundant final edit when streaming completes successfully.

When the streaming loop completes with next_pos >= len(text) (line 188), the message has already been edited with the complete text. However, if the text fits within max_message_length, lines 220-230 perform another edit with the same content. This wastes an API call and could unnecessarily trigger rate limits.

♻️ Suggested fix

Track whether the loop completed with the full content and skip the redundant edit:

+        streaming_completed = False
         # Stream partial content
         while current_pos < len(text):
             # ... existing loop code ...
             
             if next_pos >= len(text):
+                streaming_completed = True
                 break  # Complete text has been sent
             
             current_pos = next_pos
         
         # If text is too long and couldn't fit in one message, chunk it
         if _calculate_length(text, caps.length_unit) > caps.max_message_length:
             # ... existing chunking code ...
-        else:
+        elif not streaming_completed:
             # Final edit with complete content (fits in limit)
             await asyncio.sleep(caps.edit_interval_ms / 1000.0)
             # ... rest of else block ...

Also applies to: 220-230

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai/praisonai/bots/_delivery.py` around lines 182 - 191, The
message editing logic performs a redundant final edit when the streaming loop
completes successfully with the full text content. To fix this, introduce a
boolean flag (such as is_complete) that tracks whether the streaming loop in the
edit_message flow successfully sent all content (when next_pos >= len(text)
evaluates to true). Then, before the final edit operation that occurs around
lines 220-230 (after the main streaming loop ends), check this flag and only
proceed with the edit if the content was not already fully sent during the loop.
This prevents wasting an API call when the complete text has already been
delivered to the Discord message.

Comment on lines +49 to +72
async def test_stop_persists_active_sessions(self, tmp_path):
gw = _make_gateway(tmp_path)
session_id = "sess-stop-1"

session = gw.create_session("agent-1", "client-1", session_id)
session.add_message(
GatewayMessage(
content="Shutdown test message",
sender_id="user",
session_id=session_id,
)
)

await gw.stop()

store = gw._session_store
assert store is not None
assert store.session_exists(session_id)

resumed = gw.create_session("agent-1", "client-2", session_id)
messages = resumed.get_messages()

assert len(messages) == 1
assert messages[0].content == "Shutdown test message"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_stop_persists_active_sessions can pass without exercising stop-time persistence

Line 62 calls await gw.stop() while _is_running is still False, so stop() returns early and never runs session close persistence. The test then reuses the same gateway instance, so in-memory state can hide persistence regressions.

Suggested test hardening
@@
     async def test_stop_persists_active_sessions(self, tmp_path):
         gw = _make_gateway(tmp_path)
         session_id = "sess-stop-1"
@@
-        await gw.stop()
+        # Exercise the actual shutdown branch in stop()
+        gw._is_running = True
+        await gw.stop()
@@
-        resumed = gw.create_session("agent-1", "client-2", session_id)
+        # Rehydrate from storage, not in-memory state
+        gw2 = _make_gateway(tmp_path)
+        resumed = gw2.create_session("agent-1", "client-2", session_id)
         messages = resumed.get_messages()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_stop_persists_active_sessions(self, tmp_path):
gw = _make_gateway(tmp_path)
session_id = "sess-stop-1"
session = gw.create_session("agent-1", "client-1", session_id)
session.add_message(
GatewayMessage(
content="Shutdown test message",
sender_id="user",
session_id=session_id,
)
)
await gw.stop()
store = gw._session_store
assert store is not None
assert store.session_exists(session_id)
resumed = gw.create_session("agent-1", "client-2", session_id)
messages = resumed.get_messages()
assert len(messages) == 1
assert messages[0].content == "Shutdown test message"
async def test_stop_persists_active_sessions(self, tmp_path):
gw = _make_gateway(tmp_path)
session_id = "sess-stop-1"
session = gw.create_session("agent-1", "client-1", session_id)
session.add_message(
GatewayMessage(
content="Shutdown test message",
sender_id="user",
session_id=session_id,
)
)
# Exercise the actual shutdown branch in stop()
gw._is_running = True
await gw.stop()
store = gw._session_store
assert store is not None
assert store.session_exists(session_id)
# Rehydrate from storage, not in-memory state
gw2 = _make_gateway(tmp_path)
resumed = gw2.create_session("agent-1", "client-2", session_id)
messages = resumed.get_messages()
assert len(messages) == 1
assert messages[0].content == "Shutdown test message"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py` around
lines 49 - 72, The test_stop_persists_active_sessions test calls await gw.stop()
while _is_running is False, causing the stop method to return early without
executing session close persistence logic. Additionally, the test reuses the
same gateway instance after stop, so in-memory state masks whether persistence
actually worked. Fix this by ensuring the gateway is marked as running before
calling stop (set _is_running to True), and then create a completely new gateway
instance that loads from the persisted session store to verify the data was
truly persisted and not just held in memory.

@MervinPraison

Copy link
Copy Markdown
Owner

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

  1. Protocol-driven: check heavy implementations vs core SDK
  2. Backward compatible: ensure zero feature regressions
  3. Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

@praisonai-triage-agent

praisonai-triage-agent Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Claude finished @MervinPraison's task —— View job


🔍 Final Architecture Review

Todo List:

  • Read AGENTS.md to understand architecture rules
  • Review changes against protocol-driven design principles
  • Check for backward compatibility issues
  • Analyze issues from other reviewers (CodeRabbit, Qodo, Copilot)
  • Implement fixes for valid issues
  • Provide final verdict

Phase 1: Architecture Review

Protocol-driven design: The implementation follows the correct pattern - the _delivery module provides heavy implementation in the wrapper layer (praisonai/), not in the core SDK.

Backward compatibility: The restored _delivery module maintains all previously exported functions, ensuring no regressions.

Performance: No hot-path regressions detected. The gateway fixes improve correctness without adding overhead.

Phase 2: Fixed Valid Issues

Fixed all valid issues from reviewers:

  1. Fire-and-forget task (line 89) - Added task reference tracking to prevent GC
  2. Redundant final edit (line 191) - Added streaming_completed flag to avoid duplicate API call
  3. Test hardening (line 72) - Fixed test to properly verify persistence by setting _is_running and using new gateway instance
  4. Rate limiter improvement (line 247) - Added channel_id parameter for per-channel throttling

Phase 3: Final Verdict

✅ APPROVED - Ready to Merge

This PR correctly fixes three critical bugs:

  1. Gateway session resume now uses the latest snapshot (not the empty initial one)
  2. Gateway stop() now properly persists active sessions
  3. Restored the accidentally deleted _delivery module

All architectural principles are followed, backward compatibility is maintained, and reviewer feedback has been addressed. The changes are minimal, focused, and include appropriate tests.

Files Modified:

  • src/praisonai/praisonai/bots/_delivery.py - Fixed background task GC, redundant edit, and rate limiting
  • src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py - Fixed test to properly verify persistence

Skipped Changes:

  • No changes needed to gateway/server.py (already correct in PR)
  • No changes to praisonai-rust or praisonai-ts (out of scope)

- Store reference to typing indicator background task to prevent GC
- Fix redundant final edit when streaming completes successfully
- Fix test_stop_persists_active_sessions to properly test persistence
- Add channel_id parameter to rate limiter for per-channel throttling

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
@MervinPraison MervinPraison marked this pull request as ready for review June 22, 2026 22:17
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@MervinPraison MervinPraison merged commit 81c2cf8 into main Jun 22, 2026
16 checks passed
@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes three regressions: session history was silently discarded on reconnect due to a break selecting the initial empty snapshot instead of the final one; graceful shutdown via stop() bypassed close_session() so in-flight sessions were never persisted; and _delivery.py was missing after PR #2054 removed it, breaking any import of DurableDelivery, DurableAdapterMixin, or deliver_chunked.

  • Session resume fix (server.py): removes the break inside the system-message scan so the loop keeps overwriting session_data until the last snapshot (written by close_session) wins instead of stopping at the empty init snapshot.
  • Shutdown persistence fix (server.py): stop() now calls the full close_session(session_id) path for each active session, which persists state and sets the resume TTL, rather than calling session.close() directly.
  • _delivery.py restore: the file is re-added with UnifiedDelivery, DurableDelivery, deliver_with_retry, and deliver_chunked; four regression tests covering all three bugs are also added.

Confidence Score: 4/5

Safe to merge — all three fixes are minimal and targeted, none of the changes widen any code path in a way that could introduce new failures.

The session-resume and shutdown-persistence fixes in server.py are correct: iterating without break picks the last snapshot as intended, and routing through close_session() in stop() properly persists state. The restored _delivery.py satisfies all existing imports. The only open items are a missing deliver_chunked entry in the TYPE_CHECKING block (affects static analysis only, not runtime) and a missing trailing newline — neither affects correctness or runtime behavior.

No files require special attention; the two minor style issues are in bots/init.py and bots/_delivery.py.

Important Files Changed

Filename Overview
src/praisonai/praisonai/gateway/server.py Two targeted fixes: resume now scans all system messages to pick the latest snapshot, and stop() routes through close_session() to persist sessions before exit. Both changes are correct and safe.
src/praisonai/praisonai/bots/_delivery.py Restored file with UnifiedDelivery, DurableDelivery, deliver_with_retry, and deliver_chunked. All dependent imports resolve. Missing trailing newline is a minor style nit.
src/praisonai/tests/unit/gateway/test_gateway_persistence_fixes.py Four regression tests for the three bugs fixed; covers resume snapshot selection, shutdown persistence, and _delivery.py importability. Tests are well-structured but the async test requires pytest-asyncio in the environment.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant C as Client
    participant GW as WebSocketGateway
    participant SS as SessionStore

    Note over GW: Session Create
    GW->>SS: "add_message(role=system, session_data=empty_snapshot)"

    Note over C,GW: Client sends messages
    C->>GW: message
    GW->>GW: session.add_message()

    Note over GW: Session Close
    GW->>GW: close_session(session_id)
    GW->>SS: "add_message(role=system, session_data=full_snapshot)"

    Note over C,GW: Client reconnects
    C->>GW: join with session_id
    GW->>SS: get_session(session_id)
    SS-->>GW: messages [system(empty), ..., system(full)]
    Note over GW: OLD: break on first - empty snapshot
    Note over GW: NEW: iterate all - last snapshot wins
    GW->>GW: GatewaySession.from_dict(full_snapshot)
    GW-->>C: "joined (resumed=true, history intact)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant C as Client
    participant GW as WebSocketGateway
    participant SS as SessionStore

    Note over GW: Session Create
    GW->>SS: "add_message(role=system, session_data=empty_snapshot)"

    Note over C,GW: Client sends messages
    C->>GW: message
    GW->>GW: session.add_message()

    Note over GW: Session Close
    GW->>GW: close_session(session_id)
    GW->>SS: "add_message(role=system, session_data=full_snapshot)"

    Note over C,GW: Client reconnects
    C->>GW: join with session_id
    GW->>SS: get_session(session_id)
    SS-->>GW: messages [system(empty), ..., system(full)]
    Note over GW: OLD: break on first - empty snapshot
    Note over GW: NEW: iterate all - last snapshot wins
    GW->>GW: GatewaySession.from_dict(full_snapshot)
    GW-->>C: "joined (resumed=true, history intact)"
Loading

Reviews (1): Last reviewed commit: "fix: address code review feedback from r..." | Re-trigger Greptile

Comment on lines +572 to +573
"DurableDelivery",
] No newline at end of file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The file is missing a trailing newline. Most Python linters (flake8 W292, ruff) and many editors flag this, and it can cause noisy diffs when content is appended later.

Suggested change
"DurableDelivery",
]
"DurableDelivery",
]

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

praisonai-triage-agent Bot added a commit that referenced this pull request Jun 22, 2026
…2102)

* fix: gateway session data loss and restore deleted _delivery module

- Resume gateway sessions from the latest session_data snapshot instead of
  the initial empty one written at session creation
- Persist active sessions on gateway stop() via close_session()
- Restore bots/_delivery.py accidentally removed in #2054; imports for
  DurableDelivery and DurableAdapterMixin were still live
- Add regression tests for resume, shutdown persist, and delivery imports

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

* fix: address code review feedback from reviewers

- Store reference to typing indicator background task to prevent GC
- Fix redundant final edit when streaming completes successfully
- Fix test_stop_persists_active_sessions to properly test persistence
- Add channel_id parameter to rate limiter for per-channel throttling

Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mervin Praison <MervinPraison@users.noreply.github.com>
Co-authored-by: praisonai-triage-agent[bot] <272766704+praisonai-triage-agent[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants