Skip to content

Preserve end_line/end_column/end_pos when copying or pickling a Token#1611

Open
gaoflow wants to merge 1 commit into
lark-parser:masterfrom
gaoflow:fix-token-copy-end-position
Open

Preserve end_line/end_column/end_pos when copying or pickling a Token#1611
gaoflow wants to merge 1 commit into
lark-parser:masterfrom
gaoflow:fix-token-copy-end-position

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 20, 2026

Copy link
Copy Markdown

Summary

Token.__reduce__ (used by copy.copy and pickle) and Token.__deepcopy__ only forward the first five positional attributes to the constructor:

def __reduce__(self):
    return (self.__class__, (self.type, self.value, self.start_pos, self.line, self.column))

def __deepcopy__(self, memo):
    return Token(self.type, self.value, self.start_pos, self.line, self.column)

The constructor signature is (type, value, start_pos, line, column, end_line, end_column, end_pos), so end_line, end_column and end_pos are silently reset to None on every copy.copy, copy.deepcopy, and pickle round-trip:

>>> from lark.lexer import Token
>>> import copy
>>> t = Token("WORD", "hello", start_pos=0, line=1, column=1, end_line=2, end_column=6, end_pos=5)
>>> copy.deepcopy(t).end_pos
None          # expected 5

This is inconsistent with Token.new_borrow_pos (and therefore update), which already carries all eight attributes. These two methods predate the end_* attributes and were never updated to match.

Fix

Forward the three trailing position attributes in both methods, matching new_borrow_pos and the constructor's positional order. Four characters of behavior change, two lines.

Verification

  • Reproduced on master (c169b26): deepcopy/copy/pickle reset end_* to None; with the fix all eight attributes survive.
  • Added test_token_copy_preserves_end_position to tests/test_lexer.py; it fails before the fix and passes after.
  • Full suite: 1107 passed, 167 skipped, no regressions (baseline 1106 + the new test).
  • mypy lark/lexer.py is identical before and after (no new errors).

This pull request was prepared with the assistance of AI, under my direction and review.

Token.__reduce__ and Token.__deepcopy__ passed only the first five
positional attributes (type, value, start_pos, line, column) to the
constructor, so copy.copy, copy.deepcopy and pickle round-trips silently
reset end_line, end_column and end_pos to None. new_borrow_pos already
carries all eight attributes; these two methods predate the end_* fields
and were never updated to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant