Skip to content

andytimm/regrake

Repository files navigation

regrake

regrake provides an interface for regularized raking in R.

This more general formulation of the weighting problem, following Barratt et al. (2021)'s approach, enables more flexible functional forms in adherence to population targets, meaningful regularization, and ultimately more expressive and efficient survey weights.

Installation

# From r-universe (recommended — includes pre-built binaries)
install.packages("regrake", repos = "https://andytimm.r-universe.dev")

# Or from GitHub
remotes::install_github("andytimm/regrake")

Quick start

library(regrake)

set.seed(604)
n <- 500

sample_data <- data.frame(
  sex = sample(c("F", "M"), n, replace = TRUE, prob = c(0.55, 0.45)),
  age_group = sample(c("18-34", "35-54", "55+"), n, replace = TRUE,
                     prob = c(0.40, 0.35, 0.25)),
  income = rnorm(n, mean = 58000, sd = 14000)
)

# autumn-style target table: variable / level / target
pop_targets <- data.frame(
  variable = c("sex", "sex", "age_group", "age_group", "age_group", "income"),
  level = c("F", "M", "18-34", "35-54", "55+", "mean"),
  target = c(0.51, 0.49, 0.30, 0.40, 0.30, 62000)
)

fit <- regrake(
  data = sample_data,
  formula = ~ rr_exact(sex + age_group) + rr_mean(income),
  population_data = pop_targets,
  pop_type = "proportions",
  regularizer = "entropy",
  bounds = c(0.3, 3)
)

fit
head(fit$weights)
fit$balance

Formula interface

Constraint helpers include:

  • rr_exact(): exact matching
  • rr_l2(): least-squares matching
  • rr_kl(): KL matching
  • rr_range() / rr_between(): bounded matching
  • rr_mean(): continuous mean matching
  • rr_var(): continuous variance matching
  • rr_quantile(x, p): quantile matching

Variables sharing a constraint type can be combined with + inside the wrapper (for example rr_exact(sex + age_group)). Interactions use : (for example rr_l2(sex:age_group)).

Population target formats

regrake() supports:

  • pop_type = "proportions": autumn-style table with variable, level, target
  • pop_type = "raw": one row per population unit
  • pop_type = "weighted": population microdata plus a weight column
  • pop_type = "anesrake": named list of numeric vectors
  • pop_type = "survey": margin/category/value table
  • pop_type = "survey_design": survey.design object

Output

A fitted object contains:

  • weights: calibrated weights (sum to sample size)
  • balance: achieved vs target values by constraint
  • diagnostics: convergence and weight-quality diagnostics
  • solution: solver internals

For more

Status

1.0.0 released. Available on r-universe.

License

Apache License 2.0.

About

Regularized Raking implemented in R

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors