Computational and Machine Learning Scientist in Genomics
MRC Weatherall Institute of Molecular Medicine (WIMM)
University of Oxford
π¬ Milne Group β Epigenetics & Gene Regulation in Leukaemia
π Oxford, UK | π§ͺ Computational & Experimental Epigenomics | π‘ Enhancer Biology in Cancer
I investigate enhancer dynamics in cancer, with a particular focus on MLL-AF4 Acute Lymphoblastic Leukaemia (ALL). My work sits at the interface of experimental epigenomics and computational biology, combining multi-modal high-throughput datasets generated from patient-derived samples and disease models to understand chromatin-driven transcriptional dysregulation in leukaemia.
To address these questions, I generate and integrate diverse high-throughput datasets using:
- Epigenomic profiling: ChIP-seq, low-input ChIPmentation/TOPmentation, ATAC-seq, CUT&Tag, CUT&RUN
- Transcriptomics: Bulk RNA-seq, TT-seq, POINT-seq
- Chromatin conformation: NG Capture-C, Tiled-C, Micro-Capture-C
- Epigenetic modifications: DNA methylation profiling - TAPS/Bisulphite based methods
- Single-cell multiomics: 10x scMultiome, scCUT&TAG
These data are analysed using custom workflows, statistical modelling, and machine learning. Examples include a CatBoost classifier with SHAP feature importance to identify features of patient-specific enhancers using ChIP-seq datasets, and a custom PyTorch autoencoder trained on patient ALL blast H3K27ac profiles to detect sample-specific regulatory elements at scale.
I build open-source tools designed for performance, reproducibility, and ease of use in large-scale genomics. All projects ship with Docker and Apptainer containers; publication-ready tools include full CI/CD workflows (automated testing, build, and release).
- π SeqNado β docs: Modular Snakemake pipelines for processing ChIP-seq, ATAC-seq, RNA-seq, CUT&Tag, WGS, CRISPR screens, and Micro-Capture-C (MCC) data (PyPI, Conda)
- 𧬠CapCruncher β docs: Toolkit for analysing NG Capture-C, Tri-C, and Tiled-C chromatin interaction data (PyPI, Conda)
- π¦ QuantNado: High-performance genomic signal quantification platform. Per-bp coverage stored in Zarr v3 + Dask for large-scale multi-omics access (BAM, methylation, variants). Peak calling via 5 methods including a custom 1D ResNet and U-Net (PyTorch, semi-supervised learning, Optuna hyperparameter optimisation). (PyPI)
- π¬ ScNado: Rust + Python + Snakemake pipeline for single-cell CUT&Tag and RNA-seq, integrating Scanpy, SnapATAC2, and Muon for multimodal single-cell analysis
- π€ Greyhound: Fine-tuning framework for genomics foundation models (Borzoi, Enformer). LoRA/LoCon parameter-efficient fine-tuning, custom chromatin prediction heads, HuggingFace
transformers-compatible. PyTorch Lightning + W&B + Hydra. - β‘ BamNado: A fast, Rust-based utility for efficient BAM file manipulation (crates.io, PyPI)
- π PlotNado: Command-line genomic visualisation in a genome browser-like style (PyPI)
- π TrackNado: A Python package for efficient generation of UCSC Genome Browser hubs for rapid distribution and visualisation of genomic datasets (PyPI, Conda)
- Smith AL et al. (2025) β Enhancer heterogeneity in acute lymphoblastic leukemia drives differential gene expression in patients β Blood
- Lau I-J et al. (2026) β MYB activity drives emergent enhancer activation and enhancer-promoter interactions in acute lymphoblastic leukemia β Blood
- Crump NTβ , Smith ALβ et al. (2023) β MLL-AF4 cooperates with PAF1 and FACT to drive high-density enhancer interactions in leukemia β Nature Communications
- Downes DJβ , Smith ALβ et al. (2022) β Capture-C: a modular and flexible approach for high-resolution chromosome conformation capture β Nature Protocols
- Godfrey L et al. (2019) β DOT1L inhibition reveals a distinct subset of enhancers dependent on H3K79 methylation β Nature Communications
- Gao Zβ , Smith ALβ et al. (2023) β Temporal analyses reveal a pivotal role for sense and antisense enhancer RNAs in coordinate immunoglobulin lambda locus activation β Nucleic Acids Research
- Languages: Python, Rust, R, Bash, Snakemake
- ML Frameworks: PyTorch, PyTorch Lightning, TensorFlow/Keras, HuggingFace Transformers, CatBoost, SHAP, Optuna
- ML Practices: LoRA/LoCon fine-tuning, semi-supervised learning, experiment tracking (W&B), config management (Hydra)
- Single-cell: Scanpy, ArchR, SnapATAC2, Muon, Signac/Seurat
- Bioinformatics: DESeq2/pyDESeq2, MACS2/3, SEACR, deepTools, featureCounts, MultiQC, Ensembl VEP, GATK, bcftools
- Data Infrastructure: Dask, Zarr, Polars, pyranges
- Workflow & Infra: Snakemake, Conda, Docker, Apptainer, Git, GitHub Actions (CI/CD), HPC (SLURM), AWS
- Domains: Epigenomics, transcriptomics, chromatin architecture, multi-omics integration, single-cell epigenomics, foundation model fine-tuning, variant effect prediction
- π¬ Milne Group @ Oxford
- π§ͺ ORCID
- π« Email
- πΌ LinkedIn




