Skip to content

Support Cursor agent traces#8280

Open
armand0e wants to merge 2 commits into
huggingface:mainfrom
armand0e:add-cursor-agent-traces
Open

Support Cursor agent traces#8280
armand0e wants to merge 2 commits into
huggingface:mainfrom
armand0e:add-cursor-agent-traces

Conversation

@armand0e

@armand0e armand0e commented Jun 19, 2026

Copy link
Copy Markdown

Summary

  • detect Teich Cursor trace rows using Cursor-specific metadata and preserved raw_cursor data
  • normalize Cursor sessions into the shared agent traces schema with session id, timestamp, prompt, user count, tool-call count, and per-row raw trace preservation
  • update the Teich test dependency to 0.2.8 for Cursor trace conversion support

Validation

  • PYTHONPATH=src uv run --python 3.12 --with pytest --with setuptools --with pyarrow --with pandas --with teich==0.2.6 --with fsspec --with aiohttp --with xxhash --with multiprocess --with dill --with huggingface-hub --with requests --with packaging --with tqdm pytest -s tests/packaged_modules/test_json.py::test_json_generate_tables_with_agent_trace_metadata tests/packaged_modules/test_json.py::test_json_load_dataset_with_agent_trace_metadata tests/packaged_modules/test_json.py::test_json_load_dataset_without_droid_marker_stays_ordinary_json -q
  • loaded armand0e/cursor-traces-example's cursor-sessions.jsonl through the patched JSON loader and confirmed 13 rows with the shared agent trace columns

@armand0e

Copy link
Copy Markdown
Author

Here's an svg that should look good for the icon:

cursor
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 166.79 190" width="166.79" height="190" fill="none">
  <path
    fill="#EDECEC"
    transform="translate(-400 -395)"
    d="M563.463 439.971L487.344 396.057C484.899 394.646 481.883 394.646 479.439 396.057L403.323 439.971C401.269 441.156 400 443.349 400 445.723V534.276C400 536.647 401.269 538.843 403.323 540.029L479.443 583.943C481.887 585.353 484.903 585.353 487.347 583.943L563.466 540.029C565.521 538.843 566.79 536.651 566.79 534.276V445.723C566.79 443.352 565.521 441.156 563.466 439.971H563.463ZM558.681 449.273L485.199 576.451C484.703 577.308 483.391 576.958 483.391 575.966V492.691C483.391 491.027 482.501 489.488 481.058 488.652L408.887 447.016C408.03 446.52 408.38 445.209 409.373 445.209H556.337C558.424 445.209 559.728 447.47 558.685 449.276H558.681V449.273Z"
  />
</svg>

Also let me know if the trace shape is acceptable. The hard part with cursor traces is that it's all stored in a db so there's not really a standard trace format. If there's a different shape for the dataset that you would prefer feel free to let me know and I can update the cursor extraction logic accordingly.

Test repository: armand0e/cursor-traces-example

@armand0e armand0e marked this pull request as ready for review June 19, 2026 03:30
@armand0e armand0e marked this pull request as draft June 19, 2026 06:18
@armand0e

Copy link
Copy Markdown
Author

Switching the shape to be more native by following the event based session files found in .cursor/projects.

By themselves these session files are useless, but in the name of keeping huggingface traces as "native" exports I'm going to follow these transcript file structures.

If an end user uses teich to extract their cursor traces (teich extract cursor) the internal db's will also be scanned and the redacted transcripts will be populated with the raw unredacted data.

@armand0e armand0e marked this pull request as ready for review June 19, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant