Skip to content

Add read_aedes_neurons() with soma-aware rerooting#6

Open
jefferis wants to merge 10 commits into
mainfrom
read-aedes-neurons
Open

Add read_aedes_neurons() with soma-aware rerooting#6
jefferis wants to merge 10 commits into
mainfrom
read-aedes-neurons

Conversation

@jefferis
Copy link
Copy Markdown
Member

Summary

Three-level API for reading and rerooting Aedes L2 skeletons, plus a packaged neuropil mesh used as a fallback when no soma annotation is available.

  • aedes_neuropil_meshmesh3d of the Aedes brain neuropil (nm coordinates) constructed by Dana Sherman from the synapse cloud. Shipped under data/ with LazyData: true.
  • aedes_soma_position(ids, method, units) — one-row-per-id data.frame of soma positions. method = c("auto", "flytable", "nucleus") cascades FlyTable's soma_xyzflywire_nuclei(); units = c("nm", "raw"). Matches strictly by root_id so missing/duplicate FlyTable rows can't misalign. A private helper silently collapses bookkeeping duplicates in the nucleus table (rows with identical pt_root_id and pt_position, as seen on 648518347517945383) and warns only when a root_id has genuinely distinct nuclei — picking the largest by volume in that case.
  • read_aedes_neurons(ids, reroot, method, mesh) — wraps fafbseg::read_l2skel() and reroots each neuron via a per-neuron cascade: FlyTable soma → nucleus → packaged neuropil mesh (signed-distance fallback à la find_root_l2skel). Provenance recorded in a soma_source column on the neuronlist data slot. method = "none" or reroot = FALSE skip the rerooting step.

Rerooting is done via nat::reroot() (preserves metadata) rather than rebuilding the ngraph.

Commits:

  1. Add the packaged mesh (data + data-raw script + LazyData wiring).
  2. Add the three functions and their docs.
  3. Add tests.

Test plan

  • Offline unit tests for aedes_root_point() (point and mesh paths, auto cascade) against synthetic neurons + the packaged mesh.
  • Offline unit tests for .aedes_dedup_nuclei(): largest-by-volume + warning for real duplicates, silent collapse for positional duplicates, mixed case, no-volume-column fallback, no-op for unique input.
  • Live test against 648518347399768369: asserts the recorded FlyTable soma_xyz (16231,4747,3051 raw) round-trips between raw and nm units, and that read_aedes_neurons() reroots within 10 µm of that point with soma_source = "flytable".
  • Live test pinned by xyz inside 648518347517945383 (resolved to current root via aedes_xyz2id()): the nucleus-table positional duplicates are collapsed silently.
  • All 34 tests pass locally.
  • Eyeball one or two reroot results in 3D before merging if you want belt-and-braces confidence.

jefferis added 6 commits May 23, 2026 23:45
Ship aedes_neuropil_mesh, a mesh3d covering the Aedes brain neuropil
in nm coordinates, constructed by Dana Sherman from the synapse cloud.
Stored under data/ with LazyData: true (xz compression), so callers
access it as a bare symbol. data-raw/aedes_neuropil_mesh.R parses the
precomputed neuroglancer mesh fragment and regenerates the .rda.

Will be used by read_aedes_neurons() as a fallback for picking the soma
end of skeletons that lack a recorded soma_xyz / nucleus.
Three-level API for reading Aedes L2 skeletons:

* aedes_soma_position() (exported): returns a one-row-per-id data.frame
  of soma positions, with `method = c("auto", "flytable", "nucleus")`
  and `units = c("nm", "raw")`. "auto" tries FlyTable's soma_xyz first
  and falls back to flywire_nuclei() for any neuron without a recorded
  soma. Matches by root_id throughout so missing/duplicate FlyTable rows
  can't misalign the result. A private helper .aedes_dedup_nuclei()
  collapses bookkeeping duplicates in the nucleus table (rows that share
  pt_root_id and pt_position, like 648518347517945383) silently, and
  warns only when a root_id has genuinely distinct nuclei -- picking the
  largest by volume in that case.

* aedes_root_point() (private): given a neuron and either a point or a
  mesh, returns the chosen root vertex (idx / xyz / rerooted neuron).
  Reroots via nat::reroot() to preserve metadata.

* read_aedes_neurons() (exported): wraps fafbseg::read_l2skel() and,
  when reroot = TRUE, dispatches each neuron to the first usable source
  from a per-neuron cascade (FlyTable soma -> nucleus -> neuropil mesh).
  `mesh` defaults to the packaged aedes_neuropil_mesh. Each rerooted
  neuron carries its provenance in a `soma_source` column on the
  neuronlist data slot.
Offline:
* aedes_root_point() point-method picks nearest skeleton node, rval
  variants behave as documented.
* aedes_root_point() mesh-method picks the endpoint with the most
  negative signed distance against the packaged aedes_neuropil_mesh;
  auto cascade prefers a supplied point over the mesh.
* .aedes_dedup_nuclei() picks the largest nucleus per root_id (with a
  warning), silently collapses positional duplicates that share the
  same xyz, handles the mixed case, and falls back to first-row when
  the input lacks a volume column.

Live (skipped gracefully when services are unreachable):
* aedes_soma_position on 648518347399768369 returns the known FlyTable
  soma_xyz (16231,4747,3051 raw) in both raw and nm units.
* read_aedes_neurons on the same id reroots within 10 micron of the
  recorded soma and records soma_source = "flytable".
* aedes_soma_position with method = "nucleus" returns silently for a
  segment whose nucleus row has positional duplicates. The test pins
  the target by xyz inside the cell and resolves to the current root
  id via aedes_xyz2id() rather than hard-coding a root id that drifts
  with proofreading.
* otherwise check complains
Reduce a flywire_nuclei() result via dplyr pipeline:

* Trust the nuclei_v1_aedes schema; error on missing required columns
  (id, pt_root_id, pt_position, volume) rather than fall back through
  defensive column-name guessing.
* Dedup bookkeeping duplicates (rows with identical position for the
  same root id, e.g. 648518347517945383) silently using the
  position-as-string column as the natural key.
* `nuclei = c("largest", "all")` controls cardinality: "largest" keeps
  one row per input id picking the highest-volume nucleus; "all"
  returns every distinct-position candidate, sorted largest first.

Output now includes:

* `position`: an "x,y,z" string in the requested units (via
  nat::xyzmatrix2str(); use nat::xyzmatrix() to parse back). This
  matches how end users typically want soma coordinates (FlyTable,
  neuroglancer URLs).
* `source`: "flytable" / "nucleus" / NA -- single source of truth; no
  composite "nucleus-best" label. Ambiguity is conveyed by n_nuclei.
* `n_nuclei`: count of distinct-position nucleus candidates considered
  for this root id (NA when the soma came from FlyTable).
* `nucleus_id`: the nuclei_v1_aedes primary key of the chosen row
  (integer; NA for FlyTable / not-found sources).

aedes_soma_position() is now quiet by default -- callers detect
ambiguity programmatically via n_nuclei rather than via warnings.

read_aedes_neurons() propagates `soma_source` and `n_nuclei` onto the
neuronlist data slot so provenance survives subsetting.

Tests cover the largest/all modes, positional dedup, mixed
positional+real duplicates, missing-column errors, and nucleus_id
threading. Live tests pinned by stable handles (xyz inside the cell or
supervoxel id) instead of root ids that drift with proofreading.
@jefferis jefferis force-pushed the read-aedes-neurons branch from 4095c0f to 07223f2 Compare May 24, 2026 08:37
jefferis added 4 commits May 24, 2026 12:05
read_aedes_neurons() already propagates soma_source and n_nuclei onto
each rerooted neuron's data slot; nucleus_id was missing. Add it so the
provenance trio survives subsetting together: callers can identify the
exact nuclei_v1_aedes row chosen for any given neuron, and join back to
the table if they need volume/created/etc.

Live test extended to confirm all three columns exist; for the test id
(FlyTable source) nucleus_id is NA, as expected.
For neurons that have no FlyTable soma_xyz and no nucleus segmentation,
score each L2 chunk and pick the most soma-like one. The score sums
within-neuron z-scores of L2 cache attributes that are large for
somas and small for thin processes:

* area_nm2   (surface area)
* size_nm3   (volume)
* max_dt_nm  (max distance transform, ~ local radius)
* mean_dt_nm (mean distance transform)
* roundness  (pca_val_2 / pca_val_0, 1 = sphere, ~0 = line)
* mesh       (signed distance from the neuropil; more negative outside
              the neuropil = more soma-like)

Degenerate L2 chunks (pca_val_0 == 0, ~30% of typical neurons, mostly
single-voxel artefacts) are filtered out before scoring.

Three new aedes_soma_position() methods:

* "l2+mesh" -- all six features above; the most informative score and
  the new auto-cascade fallback when a mesh is supplied.
* "l2"      -- shape features only; no mesh required.
* "mesh"    -- signed distance only; errors if mesh is NULL.

The cascade in method = "auto" stays flytable > nucleus > l2+mesh:
each step only runs on the ids left over from the previous one.

Output schema gains three diagnostic columns (always present, NA for
non-L2 sources):

* soma_score    combined within-neuron z-score of the picked chunk
* dist_npil_nm  signed distance to the neuropil mesh (positive inside,
                negative outside -- soma rows are usually negative)
* l2_id         the selected L2 chunk

nucleus_id is now also populated for flytable-sourced rows from the
FlyTable `nucleus_id` annotation, so the provenance trio is meaningful
even when the cascade short-circuits on FlyTable. read_aedes_neurons()
propagates all six provenance columns (soma_source, n_nuclei,
nucleus_id, soma_score, dist_npil_nm, l2_id) onto the neuronlist data
slot.

L2 attribute fetches are batched (default 20 neurons per CAVE call)
via nat.utils::make_chunks() and chunks run under pbapply::pblapply
with optional parallelism via a `cl` argument -- worthwhile for
hundreds to thousands of neurons.

Empirical comparison on two real Aedes neurons, distance from the
chosen point to the ground-truth soma/nucleus:

  id ................................ FlyTable ........ mesh-on-endpoints ..... l2+mesh
  648518347399768369 (flytable soma)  2.1 micron      4.7 micron              2.1 micron
  648518347668922466 (nucleus only)   0.9 micron    303.6 micron(!)           0.9 micron

The previous endpoint-based mesh fallback was catastrophically wrong
for the second neuron (picks the distal axon tip furthest outside the
neuropil); the L2 score lands on the soma chunk both times.

Requires fafbseg (>= 0.15.9) for the pandas2df_inmem object-dtype fix
without which size_nm3 / area_nm2 overflow on soma-bearing L2 chunks.
A worked walkthrough of using read_aedes_neurons() / aedes_soma_position()
to recover and review KC somas, including the L2+mesh fallback path.
Adds knitr / rmarkdown to Suggests and VignetteBuilder: knitr so the
vignette builds via R CMD build.
* otherwise users will likely miss hits against flytable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant