JuliaTrustworthyAI
diff --git a/‎dev/build/paper/paper.html‎
Lines changed: 171 additions & 72 deletions b/‎dev/build/paper/paper.html‎
Lines changed: 171 additions & 72 deletions
diff --git a/‎dev/build/paper/paper.pdf‎
24.2 KB b/‎dev/build/paper/paper.pdf‎
24.2 KB
diff --git a/‎dev/paper/paper.tex‎
Lines changed: 477 additions & 243 deletions b/‎dev/paper/paper.tex‎
Lines changed: 477 additions & 243 deletions
diff --git a/‎dev/paper/sections/abstract.qmd‎
Lines changed: 1 addition & 1 deletion b/‎dev/paper/sections/abstract.qmd‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/paper/sections/empirical.qmd‎
Lines changed: 41 additions & 13 deletions b/‎dev/paper/sections/empirical.qmd‎
Lines changed: 41 additions & 13 deletions
diff --git a/‎dev/paper/sections/empirical_2.qmd‎
Lines changed: 9 additions & 0 deletions b/‎dev/paper/sections/empirical_2.qmd‎
Lines changed: 9 additions & 0 deletions
@@ -1,3 +1,3 @@
 # Abstract
 
-Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely been limited to the static setting: given some classifier we are interested in finding close, actionable, realistic, sparse, diverse and ideally causally founded counterfactuals. The ability of CE to handle dynamics like data and model drift remains a largely unexplored research challenge at this point. Only one recent work considers the implications of exogenous domain and model shifts. This project instead focuses on endogenous dynamics, that is shifts that occur when AR is actually implemented by a proportion of individuals. Early findings suggest that the involved shifts may be large with important implications on the validity of AR and the overall characteristics of the sample population.
+Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely been limited to the static setting and focused on single individuals: given some estimated model the goal is to find valid counterfactuals for individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge at this point. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work we aim to close that gap by systematizing and extending existing knowledge. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework fails to account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various popular counterfactual generators and several benchmark datasets, we generate a total XX million Counterfactual Explanations and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability Algorithmic Recourse in practice. Fortunately, we find various potential mitigation strategies that can be used in combination with existing approaches. Our simulation framework for studying recourse dynamics is fast and open-sourced. 
@@ -1,16 +1,8 @@
 # Experiment Setup {#sec-empirical}
 
-## Recourse Generators {#sec-empirical-recourse}
-
-In the case of the baseline counterfactual generator [@wachter2017counterfactual] $f$ is just the idendity function and the number of counterfactuals $K$ is equal to one. This generator, which we shall refer to as **Wachter** in the following, shall serve as the baseline against which all other gradient-based methodologies will be compared. In particular, we include include the following generator in our benchmarking exercises: REVISE [@joshi2019towards], CLUE [@antoran2020getting], DICE [@mothilal2020explaining] and a greedy approach that relies on probabilistic models [@schut2021generating]. 
-
-Both **REVISE** and **CLUE** search counterfactuals in some latent embedding $S \subset \mathcal{S}$ instead of the feature space directly. The latent embedding is learned by a separate generative model that is tasked with learning the data generating process (DGP) of $X$. In this case $f$ in @eq-general corresponds to the decoder part of the generative model, in other words the deterministic function that maps back from the latent embedding to the feature space. Provided the generative model is well-specified, traversing the latent embedding typically results in realistic and plausible counterfactuals, because they are implicitly generated by the (learned) DGP [@joshi2019towards]. CLUE distinguishes itself from REVISE and other counterfactual generators in that it aims to minimize the predictive uncertainty of the model in question $M$. To quantify predictive uncertainty the authors rely on entropy estimates for probabilistic models. The **Greedy** approach proposed by @schut2021generating also works with the subclass of models $\tilde{\mathcal{M}}\subset\mathcal{M}$ that can produce predictive uncertainty estimates. The authors show that in this setting the complexity penalty $h(\cdot)$ in @eq-general is redundant and meaningful counterfactuals can be generated in a fast and efficient manner through a modified Jacobian-based Saliency Map Attack (JSMA). Finally, **DICE** distinguishes itself from all other generators considered here in that it aims to generate a diverse set of $K>1$ counterfactuals. To this end the authors use a complexity penalty $h(\mathbf{s}^\prime)$ that favours diverse outcomes, in the sense that $s_1, ... , s_K$ look as different from each other as possible. 
-
-Our motivation for including these different generators in our analysis, is that they all offer slightly different approaches to generate meaningful counterfactuals for differentiable black-box models. We hypothesize that generating more **meaningful** counterfactuals should mitigate the endogenous dynamics illustrated in @fig-poc in @sec-intro. This intuition stems from the underlying idea that more meaningful counterfactuals are generated by the same or at least a very similar data generating process as the training data. All else equal, counterfactuals that fulfill this basic requirement should be less prone to trigger domain and model shifts. 
-
 ## Data {#sec-empirical-data}
 
-We have chosen to work with both synthetic and real-world datasets. Using synthetic data allows us to impose distributional properties that may affect the resulting recourse dynamics. Following @upadhyay2021towards we generate synthetic data in $\mathbb{R}^2$ to also allow for a visual interpretation of the results. Real-world data is used in order to assess if endogenous dynamics also occur in higher-dimensional settings.
+We have chosen to work with both synthetic and real-world datasets. Using synthetic data allows us to impose distributional properties that may affect the resulting recourse dynamics. Following @upadhyay2021towards, we generate synthetic data in $\mathbb{R}^2$ to also allow for a visual interpretation of the results. Real-world data is used in order to assess if endogenous dynamics also occur in higher-dimensional settings.
 
 ### Synthetic data
 
@@ -28,15 +20,51 @@ plt = plot(plts..., layout=(1,4), size=(850,200))
 savefig(plt, "dev/paper/www/synthetic_data.png")
 ```
 
-
-We use four synthetic binary classification datasets consisting of 1000 samples each.^[To see how the data is generated see here: [https://github.com/pat-alt/AlgorithmicRecourseDynamics.jl/blob/main/notebooks/synthetic_datasets.ipynb](https://github.com/pat-alt/AlgorithmicRecourseDynamics.jl/blob/main/notebooks/synthetic_datasets.ipynb)] The datasets are presented in @fig-synthetic-data (see also Appendix A for a formal description). Samples from the negative class are marked in blue while samples of the positive class are marked in orange.
+We use four synthetic binary classification datasets consisting of 1000 samples each. The datasets are presented in @fig-synthetic-data (see also Appendix A for a formal description). Samples from the negative class are marked in blue while samples of the positive class are marked in orange.
 
 ![Synthetic classification datasets used in our experiments.](www/synthetic_data.png){#fig-synthetic-data fig.pos="h" width="8cm" height="2cm"}
 
-Ex-ante we expect to see that Wachter will create a new cluster of counterfactual instances in the proximity of the initial decision boundary. Thus, the choice of a black-box model may have an impact on the paths of the recourse. For generators that use latent space search (@joshi2019towards, @antoran2020getting) or rely on (and have access to) probabilistic models (@antoran2020getting, @schut2021generating) we expect that counterfactuals will end up in regions of the target domain that are densely populated by training samples. Finally, we expect to see the counterfactuals generated by DiCE to be uniformly spread around the feature space inside the target class. 
+Ex-ante we expect to see that by construction Wachter will create a new cluster of counterfactual instances in the proximity of the initial decision boundary. Thus, the choice of a black-box model may have an impact on the paths of the recourse. For generators that use latent space search (REVISE @joshi2019towards, CLUE @antoran2020getting) or rely on (and have access to) probabilistic models (CLUE @antoran2020getting, Greedy @schut2021generating) we expect that counterfactuals will end up in regions of the target domain that are densely populated by training samples. Of course, this is expectation hinges on how effective said probabilistic models are at capturing predictive uncertainty. Finally, we expect to see the counterfactuals generated by DiCE to be uniformly spread around the feature space inside the target class^[As we mentioned earlier, the diversity constraint used by DiCE is only effective for when at least two counterfactuals are being generated. We have therefore decided to always generate 5 counterfactuals for each generator and randomly pick one of them.]. In summary, we expect that the endogenous shifts induced by Wachter outsize those induced by all other generators, since Wachter is the only approach that is not concered with generating what we have defined as meaningful counterfactuals. 
 
 ### Real-world data
 
-Additionally, we use two real-world datasets from the Finance domain. Firstly, we use the Give Me Some Credit dataset which was open-sourced on Kaggle for the task to predict whether a borrower is likely to experience financial difficulties in the next two years [@gmsc_data]. Originally consisting of 250,000 instances with 11 numerical attributes, the dataset was randomly undersampled to result in a balanced subsample made up of 3000 individuals. Secondly, we the German Credit dataset which involves the task of predicting if bank customers are credit-worthy or not [@germancredit1994]. It consists of 700 positive and 300 negative instances charaterized by 7 numerical and 13 categorical attributes. We process the dataset in two ways: (1) the values of the "Personal status and sex" feature are aggregated by the two represented genders; (2) the most common values are calculated for all categorical features such that a feature $x_d$ with the mode $\bar{x}_d$ is transformed into a new binary feature $\tilde{x}_d=\mathbb{I}_{x_{d,i}>=\bar{x}_d}$. Binarization ensures that we can use all counterfactual generators in the benchmark.
+We use three different real-world datasets from the Finance and Economics domain, all of which are tabular and can be used for binary classification. Firstly, we use the **Give Me Some Credit** dataset which was open-sourced on Kaggle for the task to predict whether a borrower is likely to experience financial difficulties in the next two years [@gmsc_data]. Originally consisting of 250,000 instances with 11 numerical attributes. Secondly, we use the **UCI defaultCredit** dataset [@yeh2009comparisons], a benchmark dataset that can be used to train binary classifiers to predict the binary outcome variable, whether credit card clients default on their payment. In its raw form it consists of 23 explanatory variables - 4 categorical features relating to demographic attributes^[These have been ommitted from the analysis. See @sec-limit-data for details.] and 19 continuous features largely relating to individuals' payment histories and amount of credit outstanding. Both of these datasets have been used in the literature on Algorithmic Recourse before (see for example @pawelczyk2021carla, @joshi2019towards and @ustun2019actionable), presumably because they constitute real-world classification tasks involving individuals that compete for access to credit. 
+
+As a third dataset we include the **California Housing** dataset derived from the 1990 U.S. census and sourced through scikit-learn [@pedregosa2011scikit, @pace1997sparse]. It consists of 8 continuous features that can be used to predict the median house price for California districts. The continuous outcome variable is binarized as $\tilde{y}=\mathbb{I}_{y>\text{median}(Y)}$ indicating whether or not the median house price of a given district is above or below the median of all districts. While we have not seen this dataset used in the previous literature on AR, others have used the Boston Housing dataset in a similar fashion (see for example @schut2021generating). While we initially also conducted experiments on that dataset, we eventually discarded this dataset, since it has been found to suffer from an ethical problem [@carlisle2019racist].
+
+Since the simulations involve generating counterfactuals for a significant proportion of the entire sample of individuals, we have randomly undersampled each dataset to yield balanced subsamples consisting of 10,000 individuals each. We have also standardized all explanatory features since our chosen classifiers are sensetive to scale.
+
+## Classifiers and Generative Models {#sec-empirical-classifiers}
+
+For each dataset and generator we look at three different types of classifiers all of them built and trained using `Flux.jl` [@innes2018fashionable]: firstly, a simple linear classifier - **Logistic Regression** - implemented as single linear layer with sigmoid activation; secondly, a multilayer perceptron (**MLP**); and finally, a **Deep Ensemble** composed of five MLPs following @lakshminarayanan2016simple that serves as our only probabilistic classifier. We have chosen to work with deep ensembles both for their simplicity and effectiveness at modelling predictive uncertainty. They are also the model of choice in @schut2021generating. The actual neural network architectures are kept simple (@tbl-mlp), since we are only marginally concerned with achieving good initial classifier performance. For the real-world datasets we using mini-batch training and dropout regularization. 
+
+The Latent Space generators rely on separate generative models. Following the authors of both REVISE and CLUE we use Variational Autoencoders (**VAE**) for this purpose. As with the classifiers, we deliberately choose to work with fairly simple architectures (@tbl-vae). More expressive generative models generally also lead to more meaningful counterfactuals produced by Latent Space generators. But in our view this should simply be considered as a vulnerability of counterfactual generators that rely on surrogate models to learn what realistic representations of the underlying data. 
+
+All classifiers and generative models are retrained for 10 epochs in each round $t$ of the experiment. Rather than retraining models from scratch, we initialize all parameters at their previous levels ($t-1$) and compute backpropagate for 10 epochs using the new training data as inputs into the existing model. 
+
+::: {#tbl-panel layout-ncol=1}
+
+| | Hidden Dim. | Hidden Layers | Batch | Dropout |
+|------|------|------|-----|-----|
+| Synthetic | 32 | 1 | - | - |
+| Real-World | 32 | 2 | 50 | 0.25 |
+
+: MLP {#tbl-mlp} 
+
+|  | Hidden Dim. | Epochs |
+|------|------|------|
+| Synthetic | 2 | 100 | 
+| Real-World | 8 | 250 | 
+
+: Variational Autoencoder {#tbl-vae}
+
+Model Architectures
+:::
+
+
+
+
+
+
 
 
@@ -4,3 +4,12 @@
 
 ## Potential Mitigation Strategies
 
+### Gravitational Counterfactual Explanations {#sec-empirical-2-mitigate}
+
+A straight-forward choice simply extends the baseline approach by @wachter2017counterfactual: instead of only penalizing the distance of the individuals' counterfactual to its factual, we propose penalizing its distance to some sensible point in the target domain, for example the sample average: $\bar{\mathbf{x}}$. For such a recourse objective, higher choices of $\lambda_2$ relative to $\lambda_1$ will lead counterfactuals to gravitate towards the specified point in the target domain. In the remainder of this paper we will therefore refer to this approach as **Gravitational** generator, when we investigate its potential usefulness for mitigating endongenous macrodynamics^[Note that despite the naming convention our goal here is not to provide yet another counterfactual generator, but merely investigate the most simple penalty we can think of with respect to its effectiveness.].
+
+#### A note on convergence
+
+For this simple mitigating strategy underlying the Gravitational generator to work as expected, one needs to ensure that counterfactual search continues, even after a predetermined threshold probability $\gamma$ has potentially already been reached. @fig-convergence illustrates this distinction: if one chooses to terminate search once the desired threshold is reached (left panel) the gravitational pull towards $\bar{\mathbf{x}}$ is never actually satisfied (compare to right panel). More generally, if convergence is defined simply in terms of flipping the predicted label with some desired degree of confidence, this corresponds to essentially ignoring any parts of the counterfactual search objective that do not involve $\ell(M(f(s_k^\prime)),t)$ beyond that point. While this may be appropriate for some applications, in general this seems like an odd convention. Since we nonetheless seen convergence specified simply in terms of reaching the threshold probability in some places^[@joshi2019towards define convergence of Algorithm 1 in this way. The implementation of @wachter2017counterfactual in CARLA is also defined in this way.], we thought it worth making this distinction explicit.
+
+![Comparison of counterfactual search outcome with simple (left) and strict convergence (right).](www/gravitational_generator_comparison.png){#fig-convergence fig.pos="h" width=45%}
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,3 @@`
`1`	`1`	`# Abstract`
`2`	`2`
`3`		-Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely been limited to the static setting: given some classifier we are interested in finding close, actionable, realistic, sparse, diverse and ideally causally founded counterfactuals. The ability of CE to handle dynamics like data and model drift remains a largely unexplored research challenge at this point. Only one recent work considers the implications of exogenous domain and model shifts. This project instead focuses on endogenous dynamics, that is shifts that occur when AR is actually implemented by a proportion of individuals. Early findings suggest that the involved shifts may be large with important implications on the validity of AR and the overall characteristics of the sample population.
	`3`	+Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely been limited to the static setting and focused on single individuals: given some estimated model the goal is to find valid counterfactuals for individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge at this point. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work we aim to close that gap by systematizing and extending existing knowledge. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework fails to account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various popular counterfactual generators and several benchmark datasets, we generate a total XX million Counterfactual Explanations and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability Algorithmic Recourse in practice. Fortunately, we find various potential mitigation strategies that can be used in combination with existing approaches. Our simulation framework for studying recourse dynamics is fast and open-sourced.