Skip to content

feat(knowledge-base): add enabled flag for soft enable/disable#8068

Open
YonganZhang wants to merge 1 commit intoAstrBotDevs:masterfrom
YonganZhang:feat/kb-enable-disable
Open

feat(knowledge-base): add enabled flag for soft enable/disable#8068
YonganZhang wants to merge 1 commit intoAstrBotDevs:masterfrom
YonganZhang:feat/kb-enable-disable

Conversation

@YonganZhang
Copy link
Copy Markdown

@YonganZhang YonganZhang commented May 7, 2026

Use case

Currently the only way to "turn off" a knowledge base is to delete it, which loses all indexed documents and embeddings. Users frequently want to temporarily disable a KB (e.g. testing, debugging, A/B comparison) without paying the re-indexing cost later.

Change

Three small additions to the existing schema:

  1. KnowledgeBase model gets an enabled: bool = True field.
  2. kb_db_sqlite.migrate_to_v2() adds the column to existing DBs via ALTER TABLE ... ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1, wrapped in try/except so re-running the migration is a no-op.
  3. RetrievalManager.invalidate_sparse_cache(kb_id) exposes a small helper to clear the BM25 cache when a KB is enabled/disabled or its docs change. Useful for the dashboard toggle and any other consumer that mutates KB state.

Backwards compatibility

  • New rows default to enabled=True → no behavior change for existing KBs.
  • Existing rows get 1 (true) via the migration default → no behavior change.
  • Pure additive: no existing field renamed or removed.

Diff size

16 lines across 3 files. Pure model + migration + helper — no API contract change in this PR.

Follow-up

A separate PR can wire this up in the dashboard (toggle button + filter enabled=true in retrieval). Posting this as a small, self-contained schema change first to keep review focused.

Summary by Sourcery

Introduce a soft enable/disable mechanism for knowledge bases and wire supporting schema migration and cache invalidation helper methods.

New Features:

  • Add an enabled flag to knowledge bases to support soft enable/disable behavior.
  • Expose a retrieval manager helper to invalidate BM25 sparse cache for a specific knowledge base.

Enhancements:

  • Add a v2 SQLite migration to backfill the enabled column on existing knowledge base rows without impacting current behavior.

## Use case

Currently the only way to "turn off" a knowledge base is to delete it,
which loses all indexed documents and embeddings. Users frequently want
to temporarily disable a KB (e.g. testing, debugging, A/B comparison)
without re-indexing later.

## Change

Three small additions to the existing schema:

1. `KnowledgeBase` model gets an `enabled: bool = True` field.
2. `kb_db_sqlite.migrate_to_v2()` adds the column to existing DBs via
   `ALTER TABLE ... ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1`,
   wrapped in a try/except so re-running the migration is a no-op.
3. `RetrievalManager.invalidate_sparse_cache(kb_id)` exposes a small
   helper so the BM25 cache can be cleared when a KB is enabled/disabled
   or its docs change. Useful for the upcoming dashboard toggle and any
   other consumer that mutates KB state.

## Backwards compatibility

- New rows default to `enabled=True` → no behavior change for existing KBs.
- Existing rows get `1` (true) via the migration default → no behavior change.
- Pure additive: no existing field renamed or removed.

## Diff size

16 lines across 3 files. Pure model + migration + helper — no API
contract change in this PR.

## Follow-up

A separate PR can wire this up in the dashboard (toggle button +
filter `enabled=true` in retrieval). Posting this as a small,
self-contained schema change first to keep review focused.
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. feature:knowledge-base The bug / feature is about knowledge base labels May 7, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In migrate_to_v2, catching a broad Exception and silently passing can mask real migration issues; consider catching a more specific DB exception (e.g., the 'duplicate column' error) or at least logging unexpected errors while still making re-runs idempotent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `migrate_to_v2`, catching a broad `Exception` and silently passing can mask real migration issues; consider catching a more specific DB exception (e.g., the 'duplicate column' error) or at least logging unexpected errors while still making re-runs idempotent.

## Individual Comments

### Comment 1
<location path="astrbot/core/knowledge_base/kb_db_sqlite.py" line_range="170-178" />
<code_context>

                 await session.commit()

+    async def migrate_to_v2(self) -> None:
+        """Add enabled column to knowledge_bases table."""
+        async with self.get_db() as session:
+            try:
+                await session.execute(
+                    text("ALTER TABLE knowledge_bases ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1")
+                )
+                await session.commit()
+            except Exception:
+                pass  # Column already exists
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid swallowing all exceptions during migration without logging or narrowing the failure mode

Catching a bare `Exception` and then `pass`-ing will hide any migration failure (SQL errors, permission issues, etc.), potentially leaving the DB in an inconsistent state and making problems hard to trace.

Instead, either:
- Catch only the specific error that indicates the column already exists, or
- Log the exception and special-case the "column already exists" scenario.

This keeps the migration idempotent without silently masking real issues.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +170 to +178
async def migrate_to_v2(self) -> None:
"""Add enabled column to knowledge_bases table."""
async with self.get_db() as session:
try:
await session.execute(
text("ALTER TABLE knowledge_bases ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1")
)
await session.commit()
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid swallowing all exceptions during migration without logging or narrowing the failure mode

Catching a bare Exception and then pass-ing will hide any migration failure (SQL errors, permission issues, etc.), potentially leaving the DB in an inconsistent state and making problems hard to trace.

Instead, either:

  • Catch only the specific error that indicates the column already exists, or
  • Log the exception and special-case the "column already exists" scenario.

This keeps the migration idempotent without silently masking real issues.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an enabled field to the knowledge base model and adds a corresponding SQLite migration. It also implements a cache invalidation method for sparse retrievers. However, several critical issues were identified: the migration method is defined but never called, the new cache invalidation logic references a non-existent method in the SparseRetriever class which will lead to runtime errors, and the migration's error handling is overly broad and should be more specific or include logging.

Comment on lines +170 to +179
async def migrate_to_v2(self) -> None:
"""Add enabled column to knowledge_bases table."""
async with self.get_db() as session:
try:
await session.execute(
text("ALTER TABLE knowledge_bases ADD COLUMN enabled BOOLEAN NOT NULL DEFAULT 1")
)
await session.commit()
except Exception:
pass # Column already exists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The migrate_to_v2 method is defined but never called. To ensure existing databases are updated with the new enabled column, you must invoke this method during the initialization process. Based on the current implementation of KnowledgeBaseManager._init_kb_database in astrbot/core/knowledge_base/kb_mgr.py, a call to await self.kb_db.migrate_to_v2() should be added after migrate_to_v1().

Comment on lines +293 to +295
def invalidate_sparse_cache(self, kb_id: str) -> None:
"""清除指定 KB 的 BM25 缓存"""
self.sparse_retriever.invalidate_cache(kb_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The invalidate_sparse_cache method calls self.sparse_retriever.invalidate_cache(kb_id), but the invalidate_cache method is not defined in the SparseRetriever class (see astrbot/core/knowledge_base/retrieval/sparse_retriever.py). This will cause an AttributeError at runtime. Furthermore, the current SparseRetriever implementation does not seem to utilize its _index_cache for BM25 indexing, as it rebuilds the index on every call to _retrieve_with_bm25. You should implement the caching logic and the corresponding invalidation method in SparseRetriever before exposing this helper. Additionally, ensure this new functionality is covered by unit tests.

References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

Comment on lines +178 to +179
except Exception:
pass # Column already exists
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception and silently passing is generally discouraged as it can hide unexpected errors. While this is intended to handle the case where the enabled column already exists (since SQLite's ALTER TABLE doesn't support IF NOT EXISTS), it is safer to catch a more specific exception (e.g., sqlalchemy.exc.OperationalError) and verify the error message. At the very least, consider logging a warning if an exception occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature:knowledge-base The bug / feature is about knowledge base size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant