Skip to content

refactor(optimizer): Improve histogram-based selectivity and join statistics estimation#19775

Draft
forsaken628 wants to merge 9 commits intodatabendlabs:mainfrom
forsaken628:histogram
Draft

refactor(optimizer): Improve histogram-based selectivity and join statistics estimation#19775
forsaken628 wants to merge 9 commits intodatabendlabs:mainfrom
forsaken628:histogram

Conversation

@forsaken628
Copy link
Copy Markdown
Collaborator

@forsaken628 forsaken628 commented Apr 27, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Add typed histogram infrastructure, NDV-based histogram building, typed bounds/intersection logic, histogram join estimation, and histogram table-meta serde support.
  • Introduce expression-level statistics distribution utilities and typed comparison stat derivation for numeric, decimal, string, date, timestamp, nullable, and constant
    comparisons.
  • Rework optimizer selectivity estimation to use derived function statistics and structured column-stat updates for min/max, NDV, null count, and histogram constraints.
  • Improve join cardinality/NDV estimation with histogram overlap and interval intersection, and update affected plan expectations.
  • Add optimizer selectivity smoke/property tests and histogram serialization coverage.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions Bot added the pr-refactor this PR changes the code base without new features or bugfix label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

🤖 CI Job Analysis

Workflow: 25155970661

📊 Summary

  • Total Jobs: 87
  • Failed Jobs: 4
  • Retryable: 0
  • Code Issues: 4

NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

  • linux / test_stateful_cluster: Not retryable (Code/Test)
  • linux / sqllogic / standalone_no_table_meta_cache (no_table_meta_cache, http): Not retryable (Code/Test)
  • linux / sqllogic / standalone (standalone, 2c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / standalone (standalone, 2c, http): Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

@forsaken628 forsaken628 force-pushed the histogram branch 2 times, most recently from 458c7fa to 47b7068 Compare April 29, 2026 05:13
@forsaken628 forsaken628 changed the title refactor(optimizer): histogram refactor(optimizer): Improve histogram-based selectivity and join statistics estimation Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant