Python: model PyMongo read results as sources for `py/sql-injection` in second-order SQL construction flows

`py/sql-injection` already appears to model the sink side correctly through the existing DB-API / `PEP249.qll` coverage for `execute(...)`. The gap seems to be on the source side for a common second-order pattern: values read from MongoDB with PyMongo are later reused in dynamically constructed SQL. I ran into this while triaging KBase Metrics (`CVE-2022-4860`), but the underlying issue is broader than that one project.

A reduced example looks like this:

```python
from pymongo import MongoClient
import psycopg2

def sync_users():
    users = []
    for record in MongoClient(uri).auth.users.find({"role": "dev"}, {"user": 1, "_id": 0}):
        users.append(record["user"])

    in_clause = "', '".join(users)
    sql = (
        "update user_info set active = true "
        "where username in ('" + in_clause + "')"
    )

    cur = psycopg2.connect(dsn).cursor()
    cur.execute(sql)
```

My reading of the current modeling is that this flow falls between two existing pieces: `PyMongo.qll` models collection operations for NoSQL semantics, while `py/sql-injection` starts from active threat-model sources that do not seem to cover data read back from PyMongo collections. As a result, the query has the right sink and the right string-building path shape, but no source that can reach it.

I do not think this needs a new query or wider sink modeling. The fix seems fairly contained: add source coverage for values obtained from common PyMongo read APIs such as `find`, `find_one`, and `find_one_and_*`, so that those results can participate in the existing `py/sql-injection` flow. If widening default behavior is a concern, this could also live behind an opt-in threat-model bucket for persisted database results rather than being treated as generic local input.

This pattern is common in real Python codebases, especially in cron jobs, reporting jobs, migration scripts, sync workers, and ETL-style code that bridges Mongo-backed application state into relational stores. Bandit's `B608` already flags the same family syntactically by recognizing SQL-shaped string construction passed to `execute()`, so there is at least external evidence that this is a practical and recurring pattern. CodeQL seems close to covering it already; the missing piece is verifiable semantic coverage for PyMongo-backed persisted data flowing into the existing SQL sinks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: model PyMongo read results as sources for `py/sql-injection` in second-order SQL construction flows #21775

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Python: model PyMongo read results as sources for py/sql-injection in second-order SQL construction flows #21775

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Python: model PyMongo read results as sources for `py/sql-injection` in second-order SQL construction flows #21775