Add read_aedes_neurons() with soma-aware rerooting#6
Open
jefferis wants to merge 10 commits into
Open
Conversation
Ship aedes_neuropil_mesh, a mesh3d covering the Aedes brain neuropil in nm coordinates, constructed by Dana Sherman from the synapse cloud. Stored under data/ with LazyData: true (xz compression), so callers access it as a bare symbol. data-raw/aedes_neuropil_mesh.R parses the precomputed neuroglancer mesh fragment and regenerates the .rda. Will be used by read_aedes_neurons() as a fallback for picking the soma end of skeletons that lack a recorded soma_xyz / nucleus.
Three-level API for reading Aedes L2 skeletons:
* aedes_soma_position() (exported): returns a one-row-per-id data.frame
of soma positions, with `method = c("auto", "flytable", "nucleus")`
and `units = c("nm", "raw")`. "auto" tries FlyTable's soma_xyz first
and falls back to flywire_nuclei() for any neuron without a recorded
soma. Matches by root_id throughout so missing/duplicate FlyTable rows
can't misalign the result. A private helper .aedes_dedup_nuclei()
collapses bookkeeping duplicates in the nucleus table (rows that share
pt_root_id and pt_position, like 648518347517945383) silently, and
warns only when a root_id has genuinely distinct nuclei -- picking the
largest by volume in that case.
* aedes_root_point() (private): given a neuron and either a point or a
mesh, returns the chosen root vertex (idx / xyz / rerooted neuron).
Reroots via nat::reroot() to preserve metadata.
* read_aedes_neurons() (exported): wraps fafbseg::read_l2skel() and,
when reroot = TRUE, dispatches each neuron to the first usable source
from a per-neuron cascade (FlyTable soma -> nucleus -> neuropil mesh).
`mesh` defaults to the packaged aedes_neuropil_mesh. Each rerooted
neuron carries its provenance in a `soma_source` column on the
neuronlist data slot.
Offline: * aedes_root_point() point-method picks nearest skeleton node, rval variants behave as documented. * aedes_root_point() mesh-method picks the endpoint with the most negative signed distance against the packaged aedes_neuropil_mesh; auto cascade prefers a supplied point over the mesh. * .aedes_dedup_nuclei() picks the largest nucleus per root_id (with a warning), silently collapses positional duplicates that share the same xyz, handles the mixed case, and falls back to first-row when the input lacks a volume column. Live (skipped gracefully when services are unreachable): * aedes_soma_position on 648518347399768369 returns the known FlyTable soma_xyz (16231,4747,3051 raw) in both raw and nm units. * read_aedes_neurons on the same id reroots within 10 micron of the recorded soma and records soma_source = "flytable". * aedes_soma_position with method = "nucleus" returns silently for a segment whose nucleus row has positional duplicates. The test pins the target by xyz inside the cell and resolves to the current root id via aedes_xyz2id() rather than hard-coding a root id that drifts with proofreading.
* otherwise check complains
Reduce a flywire_nuclei() result via dplyr pipeline:
* Trust the nuclei_v1_aedes schema; error on missing required columns
(id, pt_root_id, pt_position, volume) rather than fall back through
defensive column-name guessing.
* Dedup bookkeeping duplicates (rows with identical position for the
same root id, e.g. 648518347517945383) silently using the
position-as-string column as the natural key.
* `nuclei = c("largest", "all")` controls cardinality: "largest" keeps
one row per input id picking the highest-volume nucleus; "all"
returns every distinct-position candidate, sorted largest first.
Output now includes:
* `position`: an "x,y,z" string in the requested units (via
nat::xyzmatrix2str(); use nat::xyzmatrix() to parse back). This
matches how end users typically want soma coordinates (FlyTable,
neuroglancer URLs).
* `source`: "flytable" / "nucleus" / NA -- single source of truth; no
composite "nucleus-best" label. Ambiguity is conveyed by n_nuclei.
* `n_nuclei`: count of distinct-position nucleus candidates considered
for this root id (NA when the soma came from FlyTable).
* `nucleus_id`: the nuclei_v1_aedes primary key of the chosen row
(integer; NA for FlyTable / not-found sources).
aedes_soma_position() is now quiet by default -- callers detect
ambiguity programmatically via n_nuclei rather than via warnings.
read_aedes_neurons() propagates `soma_source` and `n_nuclei` onto the
neuronlist data slot so provenance survives subsetting.
Tests cover the largest/all modes, positional dedup, mixed
positional+real duplicates, missing-column errors, and nucleus_id
threading. Live tests pinned by stable handles (xyz inside the cell or
supervoxel id) instead of root ids that drift with proofreading.
4095c0f to
07223f2
Compare
read_aedes_neurons() already propagates soma_source and n_nuclei onto each rerooted neuron's data slot; nucleus_id was missing. Add it so the provenance trio survives subsetting together: callers can identify the exact nuclei_v1_aedes row chosen for any given neuron, and join back to the table if they need volume/created/etc. Live test extended to confirm all three columns exist; for the test id (FlyTable source) nucleus_id is NA, as expected.
For neurons that have no FlyTable soma_xyz and no nucleus segmentation,
score each L2 chunk and pick the most soma-like one. The score sums
within-neuron z-scores of L2 cache attributes that are large for
somas and small for thin processes:
* area_nm2 (surface area)
* size_nm3 (volume)
* max_dt_nm (max distance transform, ~ local radius)
* mean_dt_nm (mean distance transform)
* roundness (pca_val_2 / pca_val_0, 1 = sphere, ~0 = line)
* mesh (signed distance from the neuropil; more negative outside
the neuropil = more soma-like)
Degenerate L2 chunks (pca_val_0 == 0, ~30% of typical neurons, mostly
single-voxel artefacts) are filtered out before scoring.
Three new aedes_soma_position() methods:
* "l2+mesh" -- all six features above; the most informative score and
the new auto-cascade fallback when a mesh is supplied.
* "l2" -- shape features only; no mesh required.
* "mesh" -- signed distance only; errors if mesh is NULL.
The cascade in method = "auto" stays flytable > nucleus > l2+mesh:
each step only runs on the ids left over from the previous one.
Output schema gains three diagnostic columns (always present, NA for
non-L2 sources):
* soma_score combined within-neuron z-score of the picked chunk
* dist_npil_nm signed distance to the neuropil mesh (positive inside,
negative outside -- soma rows are usually negative)
* l2_id the selected L2 chunk
nucleus_id is now also populated for flytable-sourced rows from the
FlyTable `nucleus_id` annotation, so the provenance trio is meaningful
even when the cascade short-circuits on FlyTable. read_aedes_neurons()
propagates all six provenance columns (soma_source, n_nuclei,
nucleus_id, soma_score, dist_npil_nm, l2_id) onto the neuronlist data
slot.
L2 attribute fetches are batched (default 20 neurons per CAVE call)
via nat.utils::make_chunks() and chunks run under pbapply::pblapply
with optional parallelism via a `cl` argument -- worthwhile for
hundreds to thousands of neurons.
Empirical comparison on two real Aedes neurons, distance from the
chosen point to the ground-truth soma/nucleus:
id ................................ FlyTable ........ mesh-on-endpoints ..... l2+mesh
648518347399768369 (flytable soma) 2.1 micron 4.7 micron 2.1 micron
648518347668922466 (nucleus only) 0.9 micron 303.6 micron(!) 0.9 micron
The previous endpoint-based mesh fallback was catastrophically wrong
for the second neuron (picks the distal axon tip furthest outside the
neuropil); the L2 score lands on the soma chunk both times.
Requires fafbseg (>= 0.15.9) for the pandas2df_inmem object-dtype fix
without which size_nm3 / area_nm2 overflow on soma-bearing L2 chunks.
A worked walkthrough of using read_aedes_neurons() / aedes_soma_position() to recover and review KC somas, including the L2+mesh fallback path. Adds knitr / rmarkdown to Suggests and VignetteBuilder: knitr so the vignette builds via R CMD build.
* otherwise users will likely miss hits against flytable
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three-level API for reading and rerooting Aedes L2 skeletons, plus a packaged neuropil mesh used as a fallback when no soma annotation is available.
aedes_neuropil_mesh—mesh3dof the Aedes brain neuropil (nm coordinates) constructed by Dana Sherman from the synapse cloud. Shipped underdata/withLazyData: true.aedes_soma_position(ids, method, units)— one-row-per-id data.frame of soma positions.method = c("auto", "flytable", "nucleus")cascades FlyTable'ssoma_xyz→flywire_nuclei();units = c("nm", "raw"). Matches strictly byroot_idso missing/duplicate FlyTable rows can't misalign. A private helper silently collapses bookkeeping duplicates in the nucleus table (rows with identicalpt_root_idandpt_position, as seen on648518347517945383) and warns only when aroot_idhas genuinely distinct nuclei — picking the largest by volume in that case.read_aedes_neurons(ids, reroot, method, mesh)— wrapsfafbseg::read_l2skel()and reroots each neuron via a per-neuron cascade: FlyTable soma → nucleus → packaged neuropil mesh (signed-distance fallback à lafind_root_l2skel). Provenance recorded in asoma_sourcecolumn on the neuronlistdataslot.method = "none"orreroot = FALSEskip the rerooting step.Rerooting is done via
nat::reroot()(preserves metadata) rather than rebuilding the ngraph.Commits:
Test plan
aedes_root_point()(point and mesh paths, auto cascade) against synthetic neurons + the packaged mesh..aedes_dedup_nuclei(): largest-by-volume + warning for real duplicates, silent collapse for positional duplicates, mixed case, no-volume-column fallback, no-op for unique input.648518347399768369: asserts the recorded FlyTablesoma_xyz(16231,4747,3051raw) round-trips between raw and nm units, and thatread_aedes_neurons()reroots within 10 µm of that point withsoma_source = "flytable".648518347517945383(resolved to current root viaaedes_xyz2id()): the nucleus-table positional duplicates are collapsed silently.