Skip to content

docs: missing optional error_rate parameter in APPROX_COUNT_DISTINCT documentation #3296

@sundy-li

Description

@sundy-li

What's Missing

The APPROX_COUNT_DISTINCT aggregate function accepts an optional error rate parameter that controls the precision of the HyperLogLog estimation, but the current documentation only shows the single-argument form.

Source File

/workspace/databend/src/query/functions/src/aggregates/aggregate_approx_count_distinct.rs

What It Does

The function signature is:

APPROX_COUNT_DISTINCT(<expr> [, <error_rate>])
-- or equivalently:
APPROX_COUNT_DISTINCT(<error_rate>)(<expr>)

When error_rate is provided (a float64 value), the precision parameter p is computed as:

p = ceil(log2((1.04 / error_rate)^2))

and clamped to the range [4, 14]. The default precision is p = 14 (approximately 0.81% error rate). A higher error rate means fewer bits of precision and faster computation.

Current Documentation

/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md

The current doc only documents:

APPROX_COUNT_DISTINCT(<expr>)

Suggested Doc Location

/workspace/databend-docs/docs/en/sql-reference/20-sql-functions/07-aggregate-functions/aggregate-approx-count-distinct.md

The doc should be updated to show the optional error_rate parameter, explain the precision/accuracy tradeoff, and include an example using a custom error rate such as APPROX_COUNT_DISTINCT(user_id, 0.05).

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions