This repo contains our work for the Data Challenge https://www.kaggle.com/competitions/data-challenge-kernel-methods-2024-2025/overview proposed as part of the Kernel Methods course from MVA. The goal of the challenge is to classify DNA sequences based on whether they are transcription factors (TF) binding sites or not.
The file kernels.py contains most of the kernels we've experimented with. The file svc.py contains our KernelSVC implementation. The file start.py is a script for reproducing the results in our final submission.
We've obtained an overall accuracy of 0.70733 on the private data, ranking 10th out of 37 participants.