The code applies StandardScaler before Imputation, which causes data leakage. The current code scales data across the entire dataset, which causes data leakage by exposing the global distribution.. This makes the results look better than they actually are. We should fix the order to get realistic metrics
I think we should use sample-level normalization instead, so each sample is processed independently.
The code applies StandardScaler before Imputation, which causes data leakage. The current code scales data across the entire dataset, which causes data leakage by exposing the global distribution.. This makes the results look better than they actually are. We should fix the order to get realistic metrics
I think we should use sample-level normalization instead, so each sample is processed independently.