Skip to content

Feature Sort#102

Open
ascottDI wants to merge 6 commits into
mainfrom
feature-sort
Open

Feature Sort#102
ascottDI wants to merge 6 commits into
mainfrom
feature-sort

Conversation

@ascottDI

Copy link
Copy Markdown

di.sort — TorQ Modularisation PR

Summary

Extracts the .sort namespace from TorQ's dbwriteutils.q into a standalone kdb-x module: di.sort. The module sorts on-disk kdb+ tables and applies column attributes, driven by a sort.csv configuration file. It satisfies the di.* module contract: dependency injection via init, a clean exported API, and no hard dependencies beyond an injected logger.


Background

TorQ's dbwriteutils.q contains a .sort namespace used to sort HDB partitions and apply kdb+ attributes (p, s, g, u) to columns after data is written to disk. This is a common post-write step in kdb+ database maintenance — sorting by a symbol column and applying the parted attribute, for example, is standard practice for efficient data access.

This PR is part of the broader TorQ → kdb-x modularisation effort. The goal is to make each functional area independently loadable, testable, and injectable, removing the dependency on TorQ's global process framework.


Changes

New files

File Description
di/sort/sort.q Core implementation — init, getsortcsv, sorttab, getparams and internal helpers
di/sort/init.q Module entry point — loads sort.q and declares the export list
di/sort/test.csv k4unit test suite (63 tests)
di/sort/sort.md Module README

Differences from TorQ original

Aspect TorQ .sort di.sort
Logging Hard-coded .lg.o / .lg.e calls Injected log dependency ({[ctx;msg]} functions)
Config path .proc.getconfigfile["sort.csv"] global lookup savedir key in init deps; defaults to `:sort.csv
Global state Direct @[\.sort;`params;:;params]` amend .z.m.params mutable module state
Error handling Errors propagate unchecked in some paths @[f;args;handler] traps on xasc and applyattr; per-partition failures are logged and do not halt the run
Partial log dict Not applicable init validates all three keys (info, warn, error) are present and errors with a descriptive message if not
Module contract None kdb-x use singleton, init[deps] pattern, export: list

Exported API

srt:use`di.sort

srt.init[deps]              / wire dependencies; required before all other calls
srt.getsortcsv[file]        / load and validate sort config from CSV
srt.sorttab[d]              / sort on-disk partitions for one table
srt.getparams[]             / return currently loaded sort config table

sort.csv format

tabname,att,column,sort
trade,p,sym,1
trade,g,vol,0
default,p,sym,1
  • sort=1 — column is used as a sort key (xasc)
  • sort=0 — attribute applied without sorting
  • tabname=default — fallback row for tables not listed explicitly

Test coverage

Tests are in test.csv and run via k4unit:

k4unit.moduletest`di.sort

Areas covered

Area Tests
init — dependency validation Rejects ::, null log, empty dict, missing log key, non-dict log value, partial log dict (missing keys)
init — error message prefix "di.sort" prefix on all thrown errors
init — valid configs Accepts valid log dep with and without savedir
getparams — initial state Empty table with correct schema before any getsortcsv call
sorttab — auto-load Triggers getsortcsv from savedir when params is empty
getsortcsv — validation Rejects bad column names, unknown att values, missing required columns, non-existent file
getsortcsv — empty CSV Header-only file loads without error; params remains empty
getsortcsv — successful load Correct row count, column schema, tabname/att/sort values
sorttab — no-params path Returns () when no matching row and no default; emits warn log
sorttab — empty partition list Handles () partition list without error
Log verification — getsortcsv Correct info messages and getsortcsv context symbol on success; error message and context on file failure
Log verification — sorttab Logs sort start/completion, table-specific lookup, column list, default fallback, warn on no-params
Log verification — applyattr Info message with applyattr context symbol on attribute application
Log verification — error paths Error messages logged under correct context symbols for both xasc and applyattr failures

Total: 63 tests, all passing.

Test results screenshot

image image

Integration test

Verified against real on-disk HDB data with two trade partitions and one quote partition:

  • Unsorted sym column sorted ascending; p (parted) attribute applied correctly
  • vol column received g (grouped) attribute without being used as a sort key
  • price column (no sort config row) reordered in step with sym sort; no attribute applied
  • default row applied correctly to the quote table
  • Table with no matching params and no default row: warn logged, returns () without error

Notes

  • di.sort has no hard module dependencies. Only the injected log dict is required.
  • di.log satisfies the log contract out of the box: logdep:\info`warn`error!(log.info;log.warn;log.error)`.
  • A single xasc failure on one partition is logged as an error and the run continues — consistent with TorQ's original intent that a bad partition should not halt a full HDB sort.
  • The default row in sort.csv is looked up by tabname=\default. Any table name matching a row exactly takes precedence over default`.

@jonnypress jonnypress left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to modify the interface a little. We should be able to pass in a table for the sort params, which should be possible to read from a csv file, but isn’t required to be in a csv file. Does it do that? Don’t be afraid to rethink the interface to make it more user friendly - it looks quite complex at the minute.

ascottDI added 2 commits June 19, 2026 10:19
…nctionality. Reduced number of public functions and ordering issues to reduce complexity
Comment thread di/sort/sort.q
.z.m.log[`info][`readcsv;"reading sort params from ",string file];
:("SSSB";enlist",") 0: file;
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format string "SSSB" in readfile parses all four CSV columns as S S S B (symbol symbol symbol boolean). The att column is a symbol, but the column column is also a symbol — that is fine. However, the att column will read the empty string "" as the symbol ` which is the intended no-attribute sentinel, so that part is correct. The real bug is that a missing or mismatched column count in the CSV will silently misalign the parse (e.g. tmp_sort_3col.csv has only 3 columns but "SSSB" will still attempt to read 4 columns, producing a type error or wrong data rather than a clear config-validation error). This is a correctness/robustness issue: 0: with a type-list longer than the actual column count causes a length error instead of the clear di.sort: prefixed error that checkconfig would produce, so the user gets a confusing low-level error rather than the documented one.

Comment thread di/sort/sort.q
.z.m.log[`info][`sorttab;"sorting the ",st," table"];
sp:getsortparams[config;tabname;st];
if[not count sp; :()];
sortdir[sp] each distinct (),dirs;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getsortparams is called with [config;tabname;st] where tabname is the raw symbol and st is its string form. Inside getsortparams the parameter is named t, and select from config where tabname=t compares the config tabname column against the local t — but the local variable t shadows the column name tabname only if q resolves the column first. In q, inside select … where tabname=t, tabname refers to the column and t refers to the local variable; this works correctly. However, the variable name t is also the name of the function parameter for the table name, and the first line of the function body does select from config where tabname=tconfig is both the table and a parameter name used elsewhere: this is fine in isolation, but within getsortparams, config is passed as the first parameter and is used as a table in the select. The real risk: tabname is both a column name in config AND a column name in the result. The select from config where tabname=t statement uses t (the local symbol parameter), but if t were ever a table it would be silently misinterpreted. This is a latent naming-collision risk rather than an immediate bug, but worth noting. More concretely: getsortparams returns 0#config (the empty config table) on the no-match path, but sorttab checks if[not count sp; :()]count of a 0-row table is 0, so this returns () correctly. No bug here, but the 0#config shape is important and only works because count of an empty table is 0.

Comment thread di/sort/sort.q
badatts:at where not (at:distinct t`att) in validatts;
if[count badatts;
'"di.sort: unrecognised attribute(s) in att column: ",", " sv string badatts];
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkconfig checks for unrecognised columns with badcols:c where not c in \tabname`att`column`sortand then separately checks for missing columns. However, a table with all four required columns PLUS extra columns will trigger thebadcolserror. That is intentional. But a table with only a subset of the four columns and no extra columns will pass thebadcolscheck and only fail atmissingcols. The real bug is in the badcolscheck itself: it usesc where not c in `tabname`att`column`sort, which flags ANY column not in the allowed set — including legitimate columns. This means a config table with exactly `tabname`att`column`sortpasses, but one with those four in a different order also passes becausein` is order-independent. This is correct. No bug here on reflection — withdrawing this point.

Comment thread di/sort/sort.q
};

/ internal - sort columns and apply attributes for a single on-disk partition directory
sortdir:{[sp;dloc]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sortdir, the attrcols filter is select column,att from sp where not null att but does NOT filter out the empty-symbol ` sentinel (which represents "no attribute"). In q, null of a symbol returns 0b for ` (the empty symbol is NOT null — null `` ` is `0b`). So rows with `att: `` `` (the no-attribute sentinel, which is the empty symbol) will pass thenot null attfilter and be included inattrcols, causing applyattrto be called withatt=`. applyattr does guard against this with if[null att; :()], but null`` ` is `0b` in q, so that guard also fails to catch the empty symbol. The `@[{@[x;y;z#]};(dloc;colname; ``);attrerr[…]] call will then attempt to apply `# to the column on disk, which will either be a no-op or error. The validatts list includes ` as a valid att value meaning "none", but it should be explicitly excluded from the attrcols selection in sortdir using where (not null att), att<>`` `` or equivalentlywhere att in `p`s`g`u`.

Comment thread di/sort/sort.q
};

/ internal - apply a single attribute to a specific column in an on-disk partition
applyattr:{[dloc;colname;att]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In applyattr, the null-att guard if[null att; :()] uses null, which in q returns 0b for the empty symbol `. So passing att:`` bypasses this guard. The comment says "sortdir only passes non-null atts", but as noted above, not null`` ` is `1b`, so sortdir WILL pass the empty symbol through. The guard here should be `if[(null att) or att= ``; :()] or simply if[not att in \p`s`g`u; :()]` to be safe.

Comment thread di/sort/sort.q
'"failed to read ",string[file],": ",e;
};

sorttab:{[config;tabname;dirs]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readerr constructs the rethrown error message by string-concatenating e directly: '"failed to read ",string[file],": ",e. In q, the error value e passed to a trap handler is a string (character vector). If e is not a string (e.g. it is a symbol or other type in some edge cases), this concatenation will fail with a type error, swallowing the original error. This is a low-probability edge case but could mask real failures.

Comment thread di/sort/sort.q
if[not count sp; :()];
sortdir[sp] each distinct (),dirs;
.z.m.log[`info][`sorttab;"finished sorting the ",st," table"];
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorttab does not validate that tabname is a symbol before passing it to getsortparams. The test CSV (test.csv line ~72) includes a test srt.sorttab[cfg;42;enlist\:/]that is expected tofail, but checkconfigonly validatesconfig— it does not validatetabname. The call getsortparams[config;42;st]wherest:string 42="42"will attemptselect from config where tabname=42, which in q produces a type error (comparing a symbol column to a long). This will signal an error, so the test passes, but the error message will be a raw q type error rather than the documented di.sort:-prefixed error. The API contract says errors are prefixed di.sort:` but this case breaks that contract.

@adamhamilton96

Copy link
Copy Markdown

DIReview Summary

3 critical | 4 warning(s) | 0 suggestion(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants