Skip to content

Commit bd72181

Browse files
committed
sorted out package deps and compat
1 parent 19e7bab commit bd72181

19 files changed

Lines changed: 1924 additions & 39 deletions

.github/workflows/CI.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ on:
33
push:
44
branches:
55
- main
6-
- original-paper
76
tags: '*'
87
pull_request:
98
concurrency:

Project.toml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "AlgorithmicRecourseDynamics"
22
uuid = "3d1ede72-abb8-4340-bf8e-2ae06849b5ec"
3-
authors = ["Anonymous"]
3+
authors = ["Patrick Altmeyer"]
44
version = "0.1.0"
55

66
[deps]
@@ -14,7 +14,7 @@ KernelFunctions = "ec8451be-7e33-11e9-00cf-bbf324bd1392"
1414
LazyArtifacts = "4af54fe1-eca0-43a8-85a7-787d91b784e3"
1515
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1616
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
17-
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
17+
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
1818
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
1919
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
2020
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
@@ -26,19 +26,19 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
2626
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
2727

2828
[compat]
29+
CounterfactualExplanations = "0.1.4"
2930
CSV = "0.10"
3031
DataFrames = "1"
3132
Distances = "0.10"
3233
Flux = "0.13"
3334
Images = "0.25"
3435
KernelFunctions = "0.10"
35-
LaplaceRedux = "0.1"
36-
MLJ = "0.18, 0.19"
37-
MLUtils = "0.2, 0.3"
36+
MLJBase = "0.21.3"
37+
MLUtils = "0.3.1"
3838
Parameters = "0.12"
39-
Plots = "1"
39+
Plots = "1.37.2"
4040
ProgressMeter = "1"
41-
RCall = "0.13"
41+
RCall = "0.13.14"
4242
StatsBase = "0.33"
4343
julia = "1.6, 1.7, 1.8"
4444

README.qmd

Lines changed: 15 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,27 @@
11
---
22
format:
3-
gfm:
3+
commonmark:
4+
variant: -raw_html
45
wrap: none
5-
html-math-method: webtex
6+
execute:
7+
freeze: auto
8+
echo: true
9+
eval: true
10+
output: false
11+
crossref:
12+
fig-prefix: Figure
13+
tbl-prefix: Table
14+
bibliography: bib.bib
15+
jupyter: julia-1.8
616
---
717

818
[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://pat-alt.github.io/CounterfactualExplanations.jl/stable)
919
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://pat-alt.github.io/CounterfactualExplanations.jl/dev)
1020
[![Build Status](https://github.com/pat-alt/CounterfactualExplanations.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/pat-alt/CounterfactualExplanations.jl/actions/workflows/CI.yml?query=branch%3Amain)
1121
[![Coverage](https://codecov.io/gh/pat-alt/CounterfactualExplanations.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/pat-alt/CounterfactualExplanations.jl)
22+
[![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle)
23+
[![ColPrac: Contributor's Guide on Collaborative Practices for Community Packages](https://img.shields.io/badge/ColPrac-Contributor's%20Guide-blueviolet)](https://github.com/SciML/ColPrac)
24+
[![Twitter Badge](https://img.shields.io/twitter/url/https/twitter.com/paltmey.svg?style=social&label=Follow%20%40paltmey)](https://twitter.com/paltmey)
1225

1326
# AlgorithmicRecourseDynamics
1427

15-
`AlgorithmicRecourseDynamics.jl` is a Julia package for modelling Algorithmic Recourse Dynamics.
16-
17-
## Research Paper 📝
18-
19-
**Note** ⚠: You are browsing the (anonymised) [`#original-paper`](https://anonymous.4open.science/r/AlgorithmicRecourseDynamics/README.md) branch of `AlgorithmicRecourseDynamics.jl`. This branch is a static artifact corresponding to the state of the package at the time the paper was first published. It can be used to replicate the original findings of the paper. Only this branch is currently accessible as an anonymised git repository. The main repository is private and will will be open-sourced after the review process.
20-
21-
## At a Glance
22-
23-
The paper titles **Endogenous Macrodynamics in Algorithmic Recourse** is currently under review and not yet published. You can find
24-
a preprint along with other resources right here on this branch of the
25-
repository:
26-
27-
- [Paper](paper/paper.pdf)
28-
- [Notebooks](dev/notebooks/)
29-
- [Supplementary Appendix](build/dev/notebooks/appendix.html) generated from notebooks (download the HTML and view in browser)
30-
- [Artifacts]() (including data and experimental results; link currently exluded due to double-blind review process)
31-
32-
In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals. The chart below illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse: (a) we have a simple linear classifier trained for binary classification where samples from the negative class (y=0) are marked in blue and samples of the positive class (y=1) are marked in orange; (b) the implementation of AR for a random subset of individuals leads to a noticable domain shift; (c) as the classifier is retrained we observe a corresponding model shift; (d) as this process is repeated, the decision boundary moves away from the target class.
33-
34-
![](paper/www/poc.png)
35-
36-
## Paper Abstract
37-
38-
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely been limited to the static setting and focused on single individuals: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge at this point. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work we aim to close that gap by systematizing and extending existing knowledge. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework fails to account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various state-of-the-art counterfactual generators and several benchmark datasets, we generate large numbers of counterfactuals and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability of Algorithmic Recourse in situations that involve competition for scarce resources. Fortunately, we find various potential mitigation strategies that can be used in combination with existing approaches. Our simulation framework for studying recourse dynamics is fast and open-sourced.
39-

_quarto.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,19 @@ filters:
66
- lua/abstract-to-meta.lua
77
- quarto
88
bibliography: bib.bib
9-
execute:
10-
freeze: auto # re-render only when source changes
9+
10+
crossref:
11+
fig-prefix: Figure
12+
tbl-prefix: Table
13+
fig-format: png
14+
15+
execute:
16+
freeze: auto
17+
eval: true
18+
echo: true
19+
output: false
20+
21+
jupyter: julia-1.8
1122

1223

1324

docs/src/_intro.qmd

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
`AlgorithmicRecourseDynamics.jl` is a Julia package for modeling Algorithmic Recourse Dynamics.
2+
3+
## Research Paper 📝
4+
5+
**Note** ⚠: You are browsing the [`#original-paper`](https://github.com/pat-alt/AlgorithmicRecourseDynamics.jl/tree/original-paper) branch of `AlgorithmicRecourseDynamics.jl`. This branch is a static artifact corresponding to the state of the package at the time the paper was first published. It can be used to replicate the original findings of the paper.
6+
7+
## At a Glance
8+
9+
You can find resources relevant to the paper right here on this branch of the
10+
repository:
11+
12+
- [Paper](paper/paper.pdf)
13+
- [Notebooks](dev/notebooks/)
14+
- [Supplementary Appendix](build/dev/notebooks/appendix.html) generated from notebooks (download the HTML and view in browser)
15+
- [Artifacts]() (including data and experimental results; link currently exluded due to double-blind review process)
16+
17+
In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals. The chart below illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse: (a) we have a simple linear classifier trained for binary classification where samples from the negative class (y=0) are marked in blue and samples of the positive class (y=1) are marked in orange; (b) the implementation of AR for a random subset of individuals leads to a noticable domain shift; (c) as the classifier is retrained we observe a corresponding model shift; (d) as this process is repeated, the decision boundary moves away from the target class.
18+
19+
![](paper/www/poc.png)
20+
21+
## Paper Abstract
22+
23+
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work we aim to close that gap. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various state-of-the-art counterfactual generators and several benchmark datasets, we generate large numbers of counterfactuals and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability of Algorithmic Recourse in some situations. Fortunately, we find various strategies to mitigate these concerns. Our simulation framework for studying recourse dynamics is fast and open-sourced.
24+

docs/src/_metadata.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
format:
2+
commonmark:
3+
variant: -raw_html
4+
wrap: none

docs/src/index.qmd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
```@meta
2+
CurrentModule = AlgorithmicRecourseDynamics
3+
```
4+
5+
# AlgorithmicRecourseDynamics
6+
7+
Documentation for [AlgorithmicRecourseDynamics.jl](https://github.com/pat-alt/AlgorithmicRecourseDynamics.jl).
8+
9+
{{< include _intro.qmd >}}

docs/src/paper/appendix.qmd

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Supplementary Appendix
3+
format:
4+
html:
5+
self-contained: true
6+
code-fold: true
7+
execute:
8+
echo: true
9+
eval: false
10+
warning: false
11+
toc: true
12+
jupyter: julia-1.7
13+
---
14+
15+
This is a supplementary appendix to the research paper **Endogenous Macrodynamics in Algorithmic Recourse**. It contains all of the experimental results, including those not highlighted in the actual paper. It also contains additional information about the proposed counterfactual generators.
16+
17+
# Experimental Results {#sec-results}
18+
19+
{{< include experiments/synthetic.qmd >}}
20+
21+
{{< include experiments/real_world.qmd >}}
22+
23+
{{< include experiments/mitigation_strategies.qmd >}}
24+
25+
# Generators {#sec-generators}
26+
27+
{{< include generators/gravitational_generator.qmd >}}
28+
29+
{{< include generators/clap_roar_generator.qmd >}}
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: Preprocessing Real-World Data
3+
jupyter: julia-1.7
4+
---
5+
6+
```{julia}
7+
using Pkg; Pkg.activate("dev")
8+
```
9+
10+
```{julia}
11+
include("dev/utils.jl")
12+
using AlgorithmicRecourseDynamics
13+
using CounterfactualExplanations, Flux, Plots, PlotThemes, Random, LaplaceRedux, LinearAlgebra
14+
theme(:wong)
15+
output_path = output_dir("real_world")
16+
www_path = www_dir("real_world")
17+
data_path = data_dir("real_world")
18+
```
19+
20+
## California Housing Data
21+
22+
Fetching the data using Python's `sklearn`:
23+
24+
```{python}
25+
from sklearn.datasets import fetch_california_housing
26+
df, y = fetch_california_housing(return_X_y=True, as_frame=True)
27+
df["target"] = y.values
28+
data_path = "../../artifacts/upload/data/real_world"
29+
import os
30+
df.to_csv(os.path.join(data_path,"raw/cal_housing.csv"), index=False)
31+
```
32+
33+
Loading the data into Julia session:
34+
35+
```{julia}
36+
using CSV, DataFrames, Statistics, StatsBase
37+
df = CSV.read(joinpath(data_path, "raw/cal_housing.csv"), DataFrame)
38+
# Features:
39+
X = Matrix(df[:,Not(:target)])
40+
dt = fit(ZScoreTransform, X, dims=1)
41+
StatsBase.transform!(dt, X)
42+
# Target:
43+
y = df.target
44+
y = Float64.(y .>= median(y)); # binary target
45+
# Data:
46+
df = DataFrame(X,:auto)
47+
df.target = y
48+
```
49+
50+
```{julia}
51+
using MLUtils: undersample
52+
# Make DataFrames.jl work
53+
MLUtils.getobs(data::DataFrame, i) = data[i,:]
54+
MLUtils.numobs(data::DataFrame) = nrow(data)
55+
df_balanced = getobs(undersample(df, df.target;shuffle=true))
56+
```
57+
58+
```{julia}
59+
CSV.write(joinpath(data_path, "cal_housing.csv"), df)
60+
```
61+
62+
63+
## Give Me Some Credit
64+
65+
```{julia}
66+
using CSV, DataFrames, Statistics, StatsBase
67+
df = CSV.read(joinpath(data_path, "raw/cs-training.csv"), DataFrame)
68+
select!(df, Not([:Column1]))
69+
rename!(df, :SeriousDlqin2yrs => :target)
70+
mapcols!(x -> [ifelse(x_=="NA", missing, x_) for x_ in x], df)
71+
dropmissing!(df)
72+
mapcols!(x -> eltype(x) <: AbstractString ? parse.(Int, x) : x, df)
73+
# Features:
74+
X = Matrix(df[:,Not(:target)])
75+
dt = fit(ZScoreTransform, X, dims=1)
76+
StatsBase.transform!(dt, X)
77+
# Target:
78+
y = df.target
79+
# Data:
80+
df = DataFrame(X,:auto)
81+
df.target = y
82+
```
83+
84+
```{julia}
85+
using MLUtils
86+
using MLUtils: undersample
87+
# Make DataFrames.jl work
88+
MLUtils.getobs(data::DataFrame, i) = data[i,:]
89+
MLUtils.numobs(data::DataFrame) = nrow(data)
90+
df_balanced = getobs(undersample(df, df.target;shuffle=true))
91+
```
92+
93+
```{julia}
94+
CSV.write(joinpath(data_path, "gmsc.csv"), df_balanced)
95+
```
96+
97+
## UCI Credit Card Default
98+
99+
```{julia}
100+
using CSV, DataFrames, Statistics, StatsBase
101+
df = CSV.read(joinpath(data_path, "raw/UCI_Credit_Card.csv"), DataFrame)
102+
select!(df, Not([:ID, :SEX, :EDUCATION, :MARRIAGE]))
103+
rename!(df, "default.payment.next.month" => :target)
104+
dropmissing!(df)
105+
mapcols!(x -> eltype(x) <: AbstractString ? parse.(Int, x) : x, df)
106+
# Features:
107+
X = Matrix(df[:,Not(:target)])
108+
dt = fit(ZScoreTransform, X, dims=1)
109+
StatsBase.transform!(dt, X)
110+
# Target:
111+
y = df.target
112+
# Data:
113+
df = DataFrame(X,:auto)
114+
df.target = y
115+
```
116+
117+
```{julia}
118+
using MLUtils
119+
using MLUtils: undersample
120+
# Make DataFrames.jl work
121+
MLUtils.getobs(data::DataFrame, i) = data[i,:]
122+
MLUtils.numobs(data::DataFrame) = nrow(data)
123+
df_balanced = getobs(undersample(df, df.target;shuffle=true))
124+
```
125+
126+
```{julia}
127+
CSV.write(joinpath(data_path, "credit_default.csv"), df_balanced)
128+
```
129+
130+
131+
132+
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Generating Synthetic Data
3+
jupyter: julia-1.7
4+
---
5+
6+
```{julia}
7+
using Pkg; Pkg.activate("dev")
8+
```
9+
10+
```{julia}
11+
include("dev/utils.jl")
12+
using AlgorithmicRecourseDynamics
13+
output_path = output_dir("synthetic")
14+
www_path = www_dir("synthetic")
15+
data_path = data_dir("synthetic")
16+
```
17+
18+
19+
```{julia}
20+
using MLJ, DataFrames, CSV
21+
n = 1000
22+
p = 2
23+
24+
using Random
25+
Random.seed!(42)
26+
27+
# Linearly separable:
28+
X, y = make_blobs(n, p; centers=2, center_box=(-2 => 2), cluster_std=0.1)
29+
df = DataFrame(X)
30+
df.target .= ifelse.(y.==1,0,1)
31+
CSV.write(joinpath(data_path, "linearly_separable.csv"),df)
32+
33+
# Overlapping:
34+
X, y = make_blobs(n, p; centers=2, center_box=(-2 => 2), cluster_std=0.5)
35+
df = DataFrame(X)
36+
df.target .= ifelse.(y.==1,0,1)
37+
CSV.write(joinpath(data_path, "overlapping.csv"),df)
38+
39+
# Circles:
40+
X, y = make_circles(n; noise=0.15, factor=0.01)
41+
df = DataFrame(X)
42+
df.target = y
43+
CSV.write(joinpath(data_path, "circles.csv"),df)
44+
45+
# Moon:
46+
X, y = make_moons(n)
47+
df = DataFrame(X)
48+
df.target = y
49+
CSV.write(joinpath(data_path, "moons.csv"),df)
50+
```

0 commit comments

Comments
 (0)