Skip to content
View alsmith151's full-sized avatar

Highlights

  • Pro

Block or report alsmith151

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
alsmith151/README.md

🧬 Alastair Smith

Computational and Machine Learning Scientist in Genomics
MRC Weatherall Institute of Molecular Medicine (WIMM)
University of Oxford

πŸ”¬ Milne Group – Epigenetics & Gene Regulation in Leukaemia

πŸ“ Oxford, UK | πŸ§ͺ Computational & Experimental Epigenomics | πŸ’‘ Enhancer Biology in Cancer


πŸ”¬ Research Focus

I investigate enhancer dynamics in cancer, with a particular focus on MLL-AF4 Acute Lymphoblastic Leukaemia (ALL). My work sits at the interface of experimental epigenomics and computational biology, combining multi-modal high-throughput datasets generated from patient-derived samples and disease models to understand chromatin-driven transcriptional dysregulation in leukaemia.

To address these questions, I generate and integrate diverse high-throughput datasets using:

  • Epigenomic profiling: ChIP-seq, low-input ChIPmentation/TOPmentation, ATAC-seq, CUT&Tag, CUT&RUN
  • Transcriptomics: Bulk RNA-seq, TT-seq, POINT-seq
  • Chromatin conformation: NG Capture-C, Tiled-C, Micro-Capture-C
  • Epigenetic modifications: DNA methylation profiling - TAPS/Bisulphite based methods
  • Single-cell multiomics: 10x scMultiome, scCUT&TAG

These data are analysed using custom workflows, statistical modelling, and machine learning. Examples include a CatBoost classifier with SHAP feature importance to identify features of patient-specific enhancers using ChIP-seq datasets, and a custom PyTorch autoencoder trained on patient ALL blast H3K27ac profiles to detect sample-specific regulatory elements at scale.


πŸ› οΈ Software & Tools

I build open-source tools designed for performance, reproducibility, and ease of use in large-scale genomics. All projects ship with Docker and Apptainer containers; publication-ready tools include full CI/CD workflows (automated testing, build, and release).

  • πŸ”„ SeqNado β€” docs: Modular Snakemake pipelines for processing ChIP-seq, ATAC-seq, RNA-seq, CUT&Tag, WGS, CRISPR screens, and Micro-Capture-C (MCC) data (PyPI, Conda)
  • 🧬 CapCruncher β€” docs: Toolkit for analysing NG Capture-C, Tri-C, and Tiled-C chromatin interaction data (PyPI, Conda)
  • πŸ“¦ QuantNado: High-performance genomic signal quantification platform. Per-bp coverage stored in Zarr v3 + Dask for large-scale multi-omics access (BAM, methylation, variants). Peak calling via 5 methods including a custom 1D ResNet and U-Net (PyTorch, semi-supervised learning, Optuna hyperparameter optimisation). (PyPI)
  • πŸ”¬ ScNado: Rust + Python + Snakemake pipeline for single-cell CUT&Tag and RNA-seq, integrating Scanpy, SnapATAC2, and Muon for multimodal single-cell analysis
  • πŸ€– Greyhound: Fine-tuning framework for genomics foundation models (Borzoi, Enformer). LoRA/LoCon parameter-efficient fine-tuning, custom chromatin prediction heads, HuggingFace transformers-compatible. PyTorch Lightning + W&B + Hydra.
  • ⚑ BamNado: A fast, Rust-based utility for efficient BAM file manipulation (crates.io, PyPI)
  • πŸ“Š PlotNado: Command-line genomic visualisation in a genome browser-like style (PyPI)
  • 🌐 TrackNado: A Python package for efficient generation of UCSC Genome Browser hubs for rapid distribution and visualisation of genomic datasets (PyPI, Conda)

πŸ“„ Selected Publications

  1. Smith AL et al. (2025) β€” Enhancer heterogeneity in acute lymphoblastic leukemia drives differential gene expression in patients β€” Blood
  2. Lau I-J et al. (2026) β€” MYB activity drives emergent enhancer activation and enhancer-promoter interactions in acute lymphoblastic leukemia β€” Blood
  3. Crump NT†, Smith AL† et al. (2023) β€” MLL-AF4 cooperates with PAF1 and FACT to drive high-density enhancer interactions in leukemia β€” Nature Communications
  4. Downes DJ†, Smith AL† et al. (2022) β€” Capture-C: a modular and flexible approach for high-resolution chromosome conformation capture β€” Nature Protocols
  5. Godfrey L et al. (2019) β€” DOT1L inhibition reveals a distinct subset of enhancers dependent on H3K79 methylation β€” Nature Communications
  6. Gao Z†, Smith AL† et al. (2023) β€” Temporal analyses reveal a pivotal role for sense and antisense enhancer RNAs in coordinate immunoglobulin lambda locus activation β€” Nucleic Acids Research

🧠 Technical Expertise

  • Languages: Python, Rust, R, Bash, Snakemake
  • ML Frameworks: PyTorch, PyTorch Lightning, TensorFlow/Keras, HuggingFace Transformers, CatBoost, SHAP, Optuna
  • ML Practices: LoRA/LoCon fine-tuning, semi-supervised learning, experiment tracking (W&B), config management (Hydra)
  • Single-cell: Scanpy, ArchR, SnapATAC2, Muon, Signac/Seurat
  • Bioinformatics: DESeq2/pyDESeq2, MACS2/3, SEACR, deepTools, featureCounts, MultiQC, Ensembl VEP, GATK, bcftools
  • Data Infrastructure: Dask, Zarr, Polars, pyranges
  • Workflow & Infra: Snakemake, Conda, Docker, Apptainer, Git, GitHub Actions (CI/CD), HPC (SLURM), AWS
  • Domains: Epigenomics, transcriptomics, chromatin architecture, multi-omics integration, single-cell epigenomics, foundation model fine-tuning, variant effect prediction

πŸ”— Contact & Profiles

Pinned Loading

  1. Milne-Group/SeqNado Milne-Group/SeqNado Public

    A unified and user-friendly collection of pipelines for: ATAC-seq, ChIP-seq, CUT&RUN/TAG, RNA-seq, WGS, Methylation (Bisulphite/TAPS), CRISPR screens and Micro-Capture-C.

    Python 8 3

  2. sims-lab/CapCruncher sims-lab/CapCruncher Public

    Analysis tool for NG-Capture-C, Tri-C and Tiled-C data

    Python 10 4

  3. BamNado BamNado Public

    High-performance BAM file processing for genomics β€” Rust core with Python bindings via PyO3. Parallel coverage/pileup, flexible read filtering (strand, MAPQ, fragment length, barcodes, tags), and s…

    Rust 1 1

  4. plotnado plotnado Public

    PlotNado is a lightweight Python package for creating beautiful, publication-ready genome browser-style plots from genomic data files or YAML templates. It offers a simple API, fast rendering, and …

    Python 1 2

  5. TrackNado TrackNado Public

    Command line utility to generate UCSC hubs from a set of files (e.g. bigWig, bigBed etc)

    Python 1 1

  6. Milne-Group/QuantNado Milne-Group/QuantNado Public

    QuantNado provides efficient Zarr-backed storage and analysis of genomic signal from BAM and bigWig files, with support for signal reduction, feature counting, dimensionality reduction, and quantil…

    Python 2 1