Skip to content

Commit 6050300

Browse files
committed
cant get tables to work, charts will have to do
1 parent 6ecb592 commit 6050300

14 files changed

Lines changed: 872 additions & 700 deletions

File tree

_freeze/dev/notebooks/appendix/execute-results/html.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

build/dev/notebooks/appendix.html

Lines changed: 740 additions & 669 deletions
Large diffs are not rendered by default.

dev/Project.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ CounterfactualExplanations = "2f13d31b-18db-44c1-bc43-ebaf2cff0be0"
66
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
77
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
88
Gadfly = "c91e804a-d5a3-530f-b6f0-dfbca275c004"
9-
Gumbo = "708ec375-b3d6-5a57-a7ce-8257bf98657a"
109
LaplaceRedux = "c52c1a26-f7c5-402b-80be-ba1e638ad478"
1110
LibGit2 = "76f85450-5226-5b5a-8eaa-529ad045b433"
1211
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

dev/notebooks/experiments/mitigation_strategies.qmd

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,34 @@ for img in img_files
132132
end
133133
```
134134

135+
### Bootstrap
136+
137+
```{julia}
138+
n_bootstrap = 1000
139+
using AlgorithmicRecourseDynamics.Evaluation: evaluate_system
140+
using DataFrames
141+
df = DataFrame()
142+
for (key, val) in results
143+
n_folds = length(val.experiment.recourse_systems)
144+
for fold in 1:n_folds
145+
for i in length(val.experiment.system_identifiers)
146+
rec_sys = val.experiment.recourse_systems[fold][i]
147+
model_name, gen_name = collect(val.experiment.system_identifiers)[i]
148+
df_ = evaluate_system(rec_sys, val.experiment; n=n_bootstrap)
149+
df_.model .= model_name
150+
df_.generator .= gen_name
151+
df_.fold .= fold
152+
df = vcat(df, df_)
153+
end
154+
end
155+
end
156+
df = mapcols(x -> typeof(x) == Vector{Symbol} ? string.(x) : x, df)
157+
using RCall
158+
save_path = joinpath(output_path, "bootstrap_synthetic.csv")
159+
using CSV
160+
CSV.write(save_path)
161+
```
162+
135163
### Chart in paper
136164

137165
@fig-mit-paper shows the chart that went into the paper.
@@ -292,6 +320,34 @@ for img in img_files
292320
end
293321
```
294322

323+
### Bootstrap
324+
325+
```{julia}
326+
n_bootstrap = 1000
327+
using AlgorithmicRecourseDynamics.Evaluation: evaluate_system
328+
using DataFrames
329+
df = DataFrame()
330+
for (key, val) in results
331+
n_folds = length(val.experiment.recourse_systems)
332+
for fold in 1:n_folds
333+
for i in length(val.experiment.system_identifiers)
334+
rec_sys = val.experiment.recourse_systems[fold][i]
335+
model_name, gen_name = collect(val.experiment.system_identifiers)[i]
336+
df_ = evaluate_system(rec_sys, val.experiment; n=n_bootstrap)
337+
df_.model .= model_name
338+
df_.generator .= gen_name
339+
df_.fold .= fold
340+
df = vcat(df, df_)
341+
end
342+
end
343+
end
344+
df = mapcols(x -> typeof(x) == Vector{Symbol} ? string.(x) : x, df)
345+
using RCall
346+
save_path = joinpath(output_path, "bootstrap_latent.csv")
347+
using CSV
348+
CSV.write(save_path)
349+
```
350+
295351
### Chart in paper
296352

297353
@fig-mit-latent-paper shows the chart that went into the paper.
@@ -434,6 +490,34 @@ for (data_name, res) in results
434490
end
435491
```
436492

493+
### Bootstrap
494+
495+
```{julia}
496+
n_bootstrap = 1000
497+
using AlgorithmicRecourseDynamics.Evaluation: evaluate_system
498+
using DataFrames
499+
df = DataFrame()
500+
for (key, val) in results
501+
n_folds = length(val.experiment.recourse_systems)
502+
for fold in 1:n_folds
503+
for i in length(val.experiment.system_identifiers)
504+
rec_sys = val.experiment.recourse_systems[fold][i]
505+
model_name, gen_name = collect(val.experiment.system_identifiers)[i]
506+
df_ = evaluate_system(rec_sys, val.experiment; n=n_bootstrap)
507+
df_.model .= model_name
508+
df_.generator .= gen_name
509+
df_.fold .= fold
510+
df = vcat(df, df_)
511+
end
512+
end
513+
end
514+
df = mapcols(x -> typeof(x) == Vector{Symbol} ? string.(x) : x, df)
515+
using RCall
516+
save_path = joinpath(output_path, "bootstrap_real_world.csv")
517+
using CSV
518+
CSV.write(save_path)
519+
```
520+
437521
### Chart in paper
438522

439523
@fig-mit-latent-paper shows the chart that went into the paper.

dev/notebooks/experiments/real_world.qmd

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,35 @@ for img in img_files
141141
end
142142
```
143143

144+
### Bootstrap
145+
146+
```{julia}
147+
n_bootstrap = 1000
148+
using AlgorithmicRecourseDynamics.Evaluation: evaluate_system
149+
using DataFrames
150+
df = DataFrame()
151+
for (key, val) in results
152+
n_folds = length(val.experiment.recourse_systems)
153+
for fold in 1:n_folds
154+
for i in length(val.experiment.system_identifiers)
155+
rec_sys = val.experiment.recourse_systems[fold][i]
156+
model_name, gen_name = collect(val.experiment.system_identifiers)[i]
157+
df_ = evaluate_system(rec_sys, val.experiment; n=n_bootstrap)
158+
df_.model .= model_name
159+
df_.generator .= gen_name
160+
df_.fold .= fold
161+
df = vcat(df, df_)
162+
end
163+
end
164+
end
165+
df = mapcols(x -> typeof(x) == Vector{Symbol} ? string.(x) : x, df)
166+
using RCall
167+
save_path = joinpath(output_path, "bootstrap.csv")
168+
using CSV
169+
CSV.write(save_path)
170+
```
171+
172+
144173
### Chart in paper
145174

146175
@fig-real-paper shows the chart that went into the paper.

dev/notebooks/experiments/synthetic.qmd

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,7 @@ end
283283
### Bootstrap
284284

285285
```{julia}
286-
n_bootstrap = 1
286+
n_bootstrap = 1000
287287
using AlgorithmicRecourseDynamics.Evaluation: evaluate_system
288288
using DataFrames
289289
df = DataFrame()
@@ -303,19 +303,9 @@ for (key, val) in results
303303
end
304304
df = mapcols(x -> typeof(x) == Vector{Symbol} ? string.(x) : x, df)
305305
using RCall
306-
save_path = joinpath(output_path, "bootstrap.html")
307-
R"""
308-
dt <- DT::datatable($df) |>
309-
DT::formatRound(columns=c("value"), digits=3)
310-
DT::saveWidget(dt, $save_path)
311-
"""
312-
```
313-
314-
```{julia}
315-
#| eval: true
316-
using Gumbo
317-
save_path = joinpath(output_path, "bootstrap.html")
318-
parsehtml(read(save_path, String))
306+
save_path = joinpath(output_path, "bootstrap.csv")
307+
using CSV
308+
CSV.write(save_path)
319309
```
320310

321311
### Chart in paper {#sec-app-synthetic-paper}

dev/notebooks/generators/clap_roar_generator.qmd

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ output_path = output_dir("generator")
1313
www_path = www_dir("generator")
1414
```
1515

16-
# `ClapROARGenerator`
17-
1816
```{julia}
1917
using MLJ
2018
N = 1000
@@ -129,4 +127,5 @@ for (name,ce) ∈ counterfactuals
129127
plts = vcat(plts..., plt)
130128
end
131129
plt = plot(plts..., size=(800,300), layout=(1,2))
130+
display(plt)
132131
```

dev/notebooks/generators/gravitational_generator.qmd

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ output_path = output_dir("generator")
1313
www_path = www_dir("generator")
1414
```
1515

16-
# `GravitationalGenerator`
17-
1816
```{julia}
1917
using MLJ
2018
N = 1000
@@ -139,5 +137,3 @@ plt = plot(plt1, plt2, size=(850,350), layout=(1,2))
139137
savefig(plt, joinpath(www_path,"gravitational_generator_comparison.png"))
140138
```
141139

142-
143-
# References

paper/paper.Rmd

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,3 +115,7 @@ knitr::opts_chunk$set(
115115

116116
::: {#refs}
117117
:::
118+
119+
# Appendix
120+
121+

paper/sections/empirical.rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ We use four synthetic binary classification datasets consisting of 1000 samples
5555
knitr::include_graphics("www/synthetic_data.png")
5656
```
5757

58-
Ex-ante we expect to see that by construction Wachter will create a new cluster of counterfactual instances in the proximity of the initial decision boundary. Thus, the choice of a black-box model may have an impact on the paths of the recourse. For generators that use latent space search (REVISE @joshi2019towards, CLUE @antoran2020getting) or rely on (and have access to) probabilistic models (CLUE @antoran2020getting, Greedy @schut2021generating) we expect that counterfactuals will end up in regions of the target domain that are densely populated by training samples. Of course, this is expectation hinges on how effective said probabilistic models are at capturing predictive uncertainty. Finally, we expect to see the counterfactuals generated by DiCE to be uniformly spread around the feature space inside the target class^[As we mentioned earlier, the diversity constraint used by DiCE is only effective for when at least two counterfactuals are being generated. We have therefore decided to always generate 5 counterfactuals for each generator and randomly pick one of them.]. In summary, we expect that the endogenous shifts induced by Wachter outsize those induced by all other generators, since Wachter is the only approach that is not concered with generating what we have defined as meaningful counterfactuals.
58+
Ex-ante we expect to see that by construction Wachter will create a new cluster of counterfactual instances in the proximity of the initial decision boundary. Thus, the choice of a black-box model may have an impact on the paths of the recourse. For generators that use latent space search (REVISE @joshi2019towards, CLUE @antoran2020getting) or rely on (and have access to) probabilistic models (CLUE @antoran2020getting, Greedy @schut2021generating) we expect that counterfactuals will end up in regions of the target domain that are densely populated by training samples. Of course, this is expectation hinges on how effective said probabilistic models are at capturing predictive uncertainty. Finally, we expect to see the counterfactuals generated by DiCE to be uniformly spread around the feature space inside the target class^[As we mentioned earlier, the diversity constraint used by DiCE is only effective for when at least two counterfactuals are being generated. We have therefore decided to always generate 5 counterfactuals for each generator and randomly pick one of them.]. In summary, we expect that the endogenous shifts induced by Wachter outsize those induced by all other generators, since Wachter is the only approach that is not concerned with generating what we have defined as meaningful counterfactuals.
5959

6060
### Real-world data
6161

62-
We use three different real-world datasets from the Finance and Economics domain, all of which are tabular and can be used for binary classification. Firstly, we use the **Give Me Some Credit** dataset which was open-sourced on Kaggle for the task to predict whether a borrower is likely to experience financial difficulties in the next two years [@gmsc_data]. Originally consisting of 250,000 instances with 11 numerical attributes. Secondly, we use the **UCI defaultCredit** dataset [@yeh2009comparisons], a benchmark dataset that can be used to train binary classifiers to predict the binary outcome variable, whether credit card clients default on their payment. In its raw form it consists of 23 explanatory variables - 4 categorical features relating to demographic attributes^[These have been ommitted from the analysis. See Section \@ref(limit-data) for details.] and 19 continuous features largely relating to individuals' payment histories and amount of credit outstanding. Both of these datasets have been used in the literature on Algorithmic Recourse before (see for example @pawelczyk2021carla, @joshi2019towards and @ustun2019actionable), presumably because they constitute real-world classification tasks involving individuals that compete for access to credit.
62+
We use three different real-world datasets from the Finance and Economics domain, all of which are tabular and can be used for binary classification. Firstly, we use the **Give Me Some Credit** dataset which was open-sourced on Kaggle for the task to predict whether a borrower is likely to experience financial difficulties in the next two years [@gmsc_data]. Originally consisting of 250,000 instances with 11 numerical attributes. Secondly, we use the **UCI defaultCredit** dataset [@yeh2009comparisons], a benchmark dataset that can be used to train binary classifiers to predict the binary outcome variable, whether credit card clients default on their payment. In its raw form it consists of 23 explanatory variables - 4 categorical features relating to demographic attributes^[These have been omitted from the analysis. See Section \@ref(limit-data) for details.] and 19 continuous features largely relating to individuals' payment histories and amount of credit outstanding. Both of these datasets have been used in the literature on Algorithmic Recourse before (see for example @pawelczyk2021carla, @joshi2019towards and @ustun2019actionable), presumably because they constitute real-world classification tasks involving individuals that compete for access to credit.
6363

6464
As a third dataset we include the **California Housing** dataset derived from the 1990 U.S. census and sourced through scikit-learn [@pedregosa2011scikit, @pace1997sparse]. It consists of 8 continuous features that can be used to predict the median house price for California districts. The continuous outcome variable is binarized as $\tilde{y}=\mathbb{I}_{y>\text{median}(Y)}$ indicating whether or not the median house price of a given district is above or below the median of all districts. While we have not seen this dataset used in the previous literature on AR, others have used the Boston Housing dataset in a similar fashion (see for example @schut2021generating). While we initially also conducted experiments on that dataset, we eventually discarded this dataset, since it has been found to suffer from an ethical problem [@carlisle2019racist].
6565

66-
Since the simulations involve generating counterfactuals for a significant proportion of the entire sample of individuals, we have randomly undersampled each dataset to yield balanced subsamples consisting of 2,500 individuals each. We have also standardized all explanatory features since our chosen classifiers are sensetive to scale.
66+
Since the simulations involve generating counterfactuals for a significant proportion of the entire sample of individuals, we have randomly undersampled each dataset to yield balanced subsamples consisting of 2,500 individuals each. We have also standardized all explanatory features since our chosen classifiers are sensitive to scale.
6767

6868
## $G$ -- Generators
6969

0 commit comments

Comments
 (0)