Reconcile API docs and examples with the real library surface#239
Open
dchud wants to merge 1 commit into
Open
Conversation
Continues the doc-drift cleanup from #238 (bd-nj22). @acdha's review on the merged PR noted the query-DSL sections still hand-rolled logic under headings named after the dedicated query types. A full sweep of the recently-updated docs and everything nearby turned up the same class of problem in several more places, plus reference tables and Python fences that did not match the compiled library. Rust examples: - querying-fields.md: the Subfield Pattern/Value sections now use SubfieldPatternQuery/fields_matching_pattern and SubfieldValueQuery ::new/::partial/fields_matching_value instead of raw regex and manual string comparison. - reading-records.md: subfields_by_code; real MarcError struct variants. - concurrency.md: RecordBoundaryScanner + parse_batch_parallel(&b,&buf). - encoding.md: leader.character_coding (no position_9()). - testing.md: MarcReader instead of nonexistent Record::from_marc21. Reference tables / specialized records (rust-api.md): - Key Methods tables corrected: get_field/get_control_field/get_subfield, fields vs methods (tag/indicator1/indicator2/leader/subfields), and return types (Option<&str>, iterators). - AuthorityRecord/HoldingsRecord examples use the real ::builder(leader). Python fences (verified against the compiled extension): - get_fields for pymarc-style field["a"] access (fields_by_tag returns unwrapped fields); record.leader() is a method; removed nonexistent record.isbns() and field.ind1/ind2; leader.record_type. - writing-records.md: modify-a-field now uses remove + re-add, since in-place field edits do not persist. Encoding: removed documentation for MARC-8 output, which does not exist in either binding; MRRC writes UTF-8. All corrected Rust snippets were compile-checked against a throwaway example; all Python snippets were run against the built extension. Code-level issues found during the sweep are filed as bd-cdey (MARC-8 write unsupported), bd-gmax (fields_by_tag returns unwrapped fields), and bd-blja (in-place field edits not persisted). Bead: bd-du5n Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merging this PR will degrade performance by 13.92%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 👁 | WallTime | test_file_parallel_4x_10k_with_extraction |
1.1 s | 1.3 s | -13.92% |
Comparing docs/query-tutorial-use-dsl (7d9e5af) with main (901af63)
Footnotes
-
18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
This was referenced May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #238 (bd-nj22). On the merged PR, @acdha noted the query-DSL sections still hand-rolled logic under headings named after the dedicated query types. Acting on that, I swept the recently-updated docs and everything nearby and found the same class of problem in several more places, plus reference tables and Python fences that did not match the compiled library.
Everything here was verified against the real surface: each corrected Rust snippet was compile-checked with a throwaway example, and each Python snippet was run against the built extension. The sweep also surfaced agent/intuition false positives that I confirmed are actually correct and left alone (e.g.
ProducerConsumerPipeline.from_fileexists;record.leader()is correctly a method;fields_by_tagdoes exist in Python; rust-api.md's Query DSLhas_subfieldis real).Rust examples
querying-fields.md: the Subfield Pattern/Value sections now useSubfieldPatternQuery+fields_matching_patternandSubfieldValueQuery::new/::partial+fields_matching_value, instead of rawregexand manual string comparison (the exact thing @acdha flagged).reading-records.md:subfields_by_code; realMarcErrorstruct variants (IoError { cause, .. },InvalidLeader { message, .. }) — the oldIoError(e)/InvalidRecord(msg)did not compile.concurrency.md:RecordBoundaryScanner+parse_batch_parallel(&boundaries, &buffer)(the old call used an undefinedsplit_recordsand the wrong arity).encoding.md:record.leader.character_coding(there is noposition_9()).testing.md:MarcReaderinstead of the nonexistentRecord::from_marc21.Reference tables and specialized records (
rust-api.md)get_field/get_control_field/get_subfield), distinguish public fields (tag/indicator1/indicator2/leader/subfields) from methods, and show accurate return types (Option<&str>, iterators).AuthorityRecord/HoldingsRecordexamples use the real::builder(leader)API (the oldAuthorityRecordBuilder::new().control_number(...)did not exist).Python fences
get_fieldsfor pymarc-stylefield["a"]access;record.leader()is a method; removed the nonexistentrecord.isbns()andfield.ind1/ind2;leader.record_type.writing-records.md: modifying a field now uses remove + re-add, because in-place field edits do not persist to the record.Encoding
Follow-up beads (code-level findings)
bd-cdey— wire MARC-8 encoding into the writer, or commit to UTF-8-only and remove the dead encoder.bd-gmax— PythonRecord.fields_by_tagreturns unwrapped (non-subscriptable) fields, unlikeget_fields.bd-blja— Python in-place field edits are silently not persisted to the record..cargo/check.shis green (566 tests + doctests + mkdocs).Bead: bd-du5n