JuliaTrustworthyAI
diff --git a/‎dev/build/paper/paper.html‎
Lines changed: 149 additions & 117 deletions b/‎dev/build/paper/paper.html‎
Lines changed: 149 additions & 117 deletions
diff --git a/‎dev/build/paper/paper.pdf‎
681 KB b/‎dev/build/paper/paper.pdf‎
681 KB
diff --git a/‎dev/build/paper/www/mitigation_synthetic_latent_results.png‎
344 KB b/‎dev/build/paper/www/mitigation_synthetic_latent_results.png‎
344 KB
diff --git a/‎dev/build/paper/www/mitigation_synthetic_results.png‎
375 KB b/‎dev/build/paper/www/mitigation_synthetic_results.png‎
375 KB
diff --git a/‎dev/build/paper/www/synthetic_results.png‎
354 KB b/‎dev/build/paper/www/synthetic_results.png‎
354 KB
diff --git a/‎dev/paper/paper.qmd‎
Lines changed: 2 additions & 7 deletions b/‎dev/paper/paper.qmd‎
Lines changed: 2 additions & 7 deletions
diff --git a/‎dev/paper/paper.tex‎
Lines changed: 375 additions & 188 deletions b/‎dev/paper/paper.tex‎
Lines changed: 375 additions & 188 deletions
diff --git a/‎dev/paper/sections/conclusion.qmd‎
Lines changed: 3 additions & 1 deletion b/‎dev/paper/sections/conclusion.qmd‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎dev/paper/sections/discussion.qmd‎
Lines changed: 3 additions & 5 deletions b/‎dev/paper/sections/discussion.qmd‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎dev/paper/sections/empirical.qmd‎
Lines changed: 1 addition & 1 deletion b/‎dev/paper/sections/empirical.qmd‎
Lines changed: 1 addition & 1 deletion
@@ -30,19 +30,14 @@ execute:
 
 {{< include sections/empirical_2.qmd >}}
 
+{{< include sections/mitigation.qmd >}}
+
 {{< include sections/discussion.qmd >}}
 
 {{< include sections/limitations.qmd >}}
 
 {{< include sections/conclusion.qmd >}}
 
-# Acknowledgment {.unnumbered}
-
-P. A. thanks ...
-
-\pagebreak
-\FloatBarrier
-
 # References {.unnumbered}
 
 ::: {#refs}
 
@@ -1 +1,3 @@
-# Concluding Remarks {#sec-conclusion}
+# Concluding Remarks {#sec-conclusion}
+
+This work has revisited and extended some of the most general and defining concepts underlying the literature on Counterfactual Explanations and, in particular, Algorithmic Recourse. We demonstrate that long-held beliefs as to what defines optimality in AR, are too short-sighted to serve as a foundation for applications of recourse in practice. Specifically, we run multiple experiments that simulate the application of recourse in practice using various popular counterfactual generators and find that all of them induce substantial domain and model shifts. We argue that these shifts should be considered as an expected external cost of individual recourse and call for a paradigm shift from individual to collective recourse. By proposing an adapted counterfactual search objective that incorporates this cost, we make that paradigm shift explicit. We show that this modified objective lends itself to mitigation strategies that can be used to effectively decrease the magnitude of induced domain and model shifts. Through our work we hope to inspire future research on this important topic. To this end we have open-sourced all of our code along with a Julia package - `AlgorithmicRecourseDynamics.jl`. The package is built on top of `CounterfactualExplanations.jl` and inherits its extensibility [@altmeyer2022CounterfactualExplanations]. That is to say that future researchers should find it relatively easy to replicate, modify and extend the simulation experiments presented here and apply to their own custom counterfactual generators. 
@@ -1,7 +1,5 @@
 # Discussion {#sec-discussion}
 
-1. Shift of focus from individual to group of individuals (related: https://www.researchgate.net/publication/353073138_Generating_Collective_Counterfactual_Explanations_in_Score-Based_Classification_via_Mathematical_Optimization)
-2. Convergence criterium matters: terminating once threshold probability is reached may not be optimal (see e.g. REVISE)
-3. Optimizer choice matters: dimensionality is typically low, so no obvious benefit to using ADAM.
-    - This might be better placed in JuliaCon proceedings, perhaps backed by small blog post on the matter. 
-4. Mitigating strategy: penaliye distance from centroid.
+Our results in @sec-empirical-2 indicate that state-of-the-art approaches to Algorithmic Recourse induce substantial domain and model shift, if implemented at scale in practice. These induced shifts can and should be considered as an (expected) external cost of individual recourse. While they do not affect the individual directly as long as we look at the individual in isolation, they can been seen to affect the broader group of stakeholders in automated data-driven decision-making. We have seen, for example, that out-of-sample model performance generally deteriorates in our simulation experiments. In practice, this can be seen as a cost to model owners, that is the group of stakeholders using the model as decision-making tool. As we have set out in the introduction, these model owners will generally be unwilling to carry that cost, and hence can be expected to stop offering recourse to individuals altogether. This in turn is costly to those individuals that would otherwise derive utility from being offered recourse. 
+
+So, where does this leave us? We would argue that the expected external costs of individual recourse should be shared by all stakeholders. The most straight-forward way to achieve this is to introduce a penalty for external costs in the counterfactual search objective function, as we have set out in @eq-collective. This will on average lead to more costly counterfactual outcomes. But it may help to avoid extreme scenarios, in which minimal-cost recourse is reserved to a tiny minority of individuals. We have shown various types of shift-mitigating strategies that can be used to this end. Since all of these strategies can be seen simply as specific adaption of @eq-collective, they can be applied to any of the various counterfactual generators studied here. 
@@ -1,6 +1,6 @@
 # Experiment Setup {#sec-empirical}
 
-This section presents the exact ingredients and parameter choices describing the simulation experiments we ran to produce the findings presented in the next section (@sec-empirical-2). For convenience, we use Algorithm \ref{algo-experiment} as a template to guide us through this section. A few high-level details upfront: each experiment is run for a total of $T=50$ rounds, where in each round we provide recourse to five percent of all individuals in the non-target class, so $B_t=0.05 * N_t^{\mathcal{D}_0}$^[As mentioned in the previous section, we end up providing recourse to a total of $\approx50\%$ by the end of round $T=50$.]. All classifiers and generative models are retrained for 10 epochs in each round $t$ of the experiment. Rather than retraining models from scratch, we initialize all parameters at their previous levels ($t-1$) and compute backpropagate for 10 epochs using the new training data as inputs into the existing model. Evaluation metrics are computed and stored every 10 rounds. 
+This section presents the exact ingredients and parameter choices describing the simulation experiments we ran to produce the findings presented in the next section (@sec-empirical-2). For convenience, we use Algorithm \ref{algo-experiment} as a template to guide us through this section. A few high-level details upfront: each experiment is run for a total of $T=50$ rounds, where in each round we provide recourse to five percent of all individuals in the non-target class, so $B_t=0.05 * N_t^{\mathcal{D}_0}$^[As mentioned in the previous section, we end up providing recourse to a total of $\approx50\%$ by the end of round $T=50$.]. All classifiers and generative models are retrained for 10 epochs in each round $t$ of the experiment. Rather than retraining models from scratch, we initialize all parameters at their previous levels ($t-1$) and compute backpropagate for 10 epochs using the new training data as inputs into the existing model. Evaluation metrics are computed and stored every 10 rounds. To account for noise each individual experiment is repeat five times.
 
 ## $M$ -- Classifiers and Generative Models {#sec-empirical-classifiers}