Skip to content

Data leakage for imputation task #819

@thongtruongpbc

Description

@thongtruongpbc

The code applies StandardScaler before Imputation, which causes data leakage. The current code scales data across the entire dataset, which causes data leakage by exposing the global distribution.. This makes the results look better than they actually are. We should fix the order to get realistic metrics
I think we should use sample-level normalization instead, so each sample is processed independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions