Skip to content

research: evaluate dplyr removal feasibility (108 call sites) #86

Description

@chris-prener

Summary

Evaluate whether dplyr (108 call sites across 9 files) can realistically be replaced with Base R. This is a research issue — any resulting implementation would be a separate epic-level effort.

Current usage (108 call sites)

Function Approx. count Base R equivalent
filter() ~20 subset() or [,]
select() ~15 [, cols]
mutate() ~20 transform() or $<-
left_join() ~10 merge(x, y, all.x=TRUE)
bind_rows() ~8 do.call(rbind, ...)
group_by()/summarise() ~15 aggregate() or tapply()
arrange() ~5 [order(...),]
rename()/rename_with() ~8 names()<-
everything()/all_of() ~7 column indexing

Files affected

zi_aggregate.R, zi_crosswalk.R, zi_get_demographics.R, zi_get_geometry.R, zi_label.R, zi_list_zctas.R, zi_load_crosswalk.R, zi_load_labels.R, zi_prep_hud.R

Considerations

  • Scale: 108 call sites is a major refactor with high regression risk
  • Readability: dplyr pipelines are significantly more readable than Base R equivalents for complex transformations
  • Ecosystem: tidycensus and tigris return tibbles and are designed for dplyr workflows
  • Transitive deps: dplyr brings tibble, rlang, vctrs, pillar etc. — removing it would significantly shrink the dep tree
  • User expectation: R users in the tidyverse ecosystem expect tibble/dplyr patterns

Context

Acceptance criteria

  • Document the full scope of a dplyr-to-Base-R migration
  • Benchmark key operations (grouped aggregation, joins) for performance comparison
  • Recommend GO / NO-GO with rationale
  • If GO: file a dedicated epic with scoped sub-issues

Notes

Given that tidycensus and tigris are core dependencies that themselves use dplyr/tibble, removing dplyr may not yield meaningful install-time savings. The research should quantify this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpriority/mediumModerate impact, address as capacity allows

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions