Skip to content

[Proposal] Native, vectorised, zero-copy execution path for Druid #19456

Description

@Shekharrajak

Motivation

Native, vectorised, zero-copy execution path for Druid via FFM and Arrow — staged from segment reader → filter/project → DataFusion (follow-up to #19039 item 1)

Druid today does the entire query hot path on the JVM. Each row passes through a ColumnSelector interface, materialises into Java objects, gets compared/aggregated by JIT-compiled byte code - means :

Proposed changes

Phase 0 — Spike: native segment column reader via FFM
Phase 1 — Layer0 production-shape: native column reader for all common types
Phase 2 — L1: vectorised native filter + project over Arrow batches
Phase 3 — L2: DataFusion as candidate aggregate/join executor in MSQ stages

Rationale

Operational impact

Performance 2–3× speedup on

  • speedup on cold-cache scan + filter for typical Druid datasources.
  • speedup on MSQ fact-fact joins.
  • speedup on external Parquet ingestion.
  • lower p99 for high-cardinality GroupBy.
  • faster execution under memory pressure (DataFusion graceful spill).

Operational

Test plan (optional)

Future work (optional)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions