Skip to content

Reconcile API docs and examples with the real library surface#239

Open
dchud wants to merge 1 commit into
mainfrom
docs/query-tutorial-use-dsl
Open

Reconcile API docs and examples with the real library surface#239
dchud wants to merge 1 commit into
mainfrom
docs/query-tutorial-use-dsl

Conversation

@dchud
Copy link
Copy Markdown
Owner

@dchud dchud commented May 29, 2026

Follow-up to #238 (bd-nj22). On the merged PR, @acdha noted the query-DSL sections still hand-rolled logic under headings named after the dedicated query types. Acting on that, I swept the recently-updated docs and everything nearby and found the same class of problem in several more places, plus reference tables and Python fences that did not match the compiled library.

Everything here was verified against the real surface: each corrected Rust snippet was compile-checked with a throwaway example, and each Python snippet was run against the built extension. The sweep also surfaced agent/intuition false positives that I confirmed are actually correct and left alone (e.g. ProducerConsumerPipeline.from_file exists; record.leader() is correctly a method; fields_by_tag does exist in Python; rust-api.md's Query DSL has_subfield is real).

Rust examples

  • querying-fields.md: the Subfield Pattern/Value sections now use SubfieldPatternQuery + fields_matching_pattern and SubfieldValueQuery::new/::partial + fields_matching_value, instead of raw regex and manual string comparison (the exact thing @acdha flagged).
  • reading-records.md: subfields_by_code; real MarcError struct variants (IoError { cause, .. }, InvalidLeader { message, .. }) — the old IoError(e)/InvalidRecord(msg) did not compile.
  • concurrency.md: RecordBoundaryScanner + parse_batch_parallel(&boundaries, &buffer) (the old call used an undefined split_records and the wrong arity).
  • encoding.md: record.leader.character_coding (there is no position_9()).
  • testing.md: MarcReader instead of the nonexistent Record::from_marc21.

Reference tables and specialized records (rust-api.md)

  • The Record and Field "Key Methods" tables now use real names (get_field/get_control_field/get_subfield), distinguish public fields (tag/indicator1/indicator2/leader/subfields) from methods, and show accurate return types (Option<&str>, iterators).
  • The AuthorityRecord/HoldingsRecord examples use the real ::builder(leader) API (the old AuthorityRecordBuilder::new().control_number(...) did not exist).

Python fences

  • get_fields for pymarc-style field["a"] access; record.leader() is a method; removed the nonexistent record.isbns() and field.ind1/ind2; leader.record_type.
  • writing-records.md: modifying a field now uses remove + re-add, because in-place field edits do not persist to the record.

Encoding

  • Removed documentation for MARC-8 output, which is not supported in either binding — MRRC writes UTF-8. There was no prior decision recording UTF-8-only-on-write; the encoder exists but is unwired.

Follow-up beads (code-level findings)

  • bd-cdey — wire MARC-8 encoding into the writer, or commit to UTF-8-only and remove the dead encoder.
  • bd-gmax — Python Record.fields_by_tag returns unwrapped (non-subscriptable) fields, unlike get_fields.
  • bd-blja — Python in-place field edits are silently not persisted to the record.

.cargo/check.sh is green (566 tests + doctests + mkdocs).

Bead: bd-du5n

Continues the doc-drift cleanup from #238 (bd-nj22). @acdha's review on
the merged PR noted the query-DSL sections still hand-rolled logic under
headings named after the dedicated query types. A full sweep of the
recently-updated docs and everything nearby turned up the same class of
problem in several more places, plus reference tables and Python fences
that did not match the compiled library.

Rust examples:
- querying-fields.md: the Subfield Pattern/Value sections now use
  SubfieldPatternQuery/fields_matching_pattern and SubfieldValueQuery
  ::new/::partial/fields_matching_value instead of raw regex and manual
  string comparison.
- reading-records.md: subfields_by_code; real MarcError struct variants.
- concurrency.md: RecordBoundaryScanner + parse_batch_parallel(&b,&buf).
- encoding.md: leader.character_coding (no position_9()).
- testing.md: MarcReader instead of nonexistent Record::from_marc21.

Reference tables / specialized records (rust-api.md):
- Key Methods tables corrected: get_field/get_control_field/get_subfield,
  fields vs methods (tag/indicator1/indicator2/leader/subfields), and
  return types (Option<&str>, iterators).
- AuthorityRecord/HoldingsRecord examples use the real ::builder(leader).

Python fences (verified against the compiled extension):
- get_fields for pymarc-style field["a"] access (fields_by_tag returns
  unwrapped fields); record.leader() is a method; removed nonexistent
  record.isbns() and field.ind1/ind2; leader.record_type.
- writing-records.md: modify-a-field now uses remove + re-add, since
  in-place field edits do not persist.

Encoding: removed documentation for MARC-8 output, which does not exist
in either binding; MRRC writes UTF-8.

All corrected Rust snippets were compile-checked against a throwaway
example; all Python snippets were run against the built extension.

Code-level issues found during the sweep are filed as bd-cdey (MARC-8
write unsupported), bd-gmax (fields_by_tag returns unwrapped fields),
and bd-blja (in-place field edits not persisted).

Bead: bd-du5n

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 29, 2026

Merging this PR will degrade performance by 13.92%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 1 (👁 1) regressed benchmark
✅ 59 untouched benchmarks
⏩ 18 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
👁 WallTime test_file_parallel_4x_10k_with_extraction 1.1 s 1.3 s -13.92%

Comparing docs/query-tutorial-use-dsl (7d9e5af) with main (901af63)

Open in CodSpeed

Footnotes

  1. 18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant