Add proteinbox workflow#15
Open
MSiggel wants to merge 14 commits into
Open
Conversation
…Config New models for the proteinbox simulation type: - ProteinSpecies: PDB-path-based species with disulfide/protonation annotations - ProteinBoxComposition: protein + box_padding + ionization config - Pdb2gmxConfig: force field and water model config for pdb2gmx - GromacsProteinParameterSet: output paths from pdb2gmx - BuildInput extended with proteinbox type and pdb2gmx parametrization Closes: relates to #13 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Functions for the proteinbox build pipeline: - check_gmx_available: verify gmx binary is on PATH - clean_pdb: PDBFixer-based PDB standardization - run_pdb2gmx: subprocess wrapper for gmx pdb2gmx - extract_charge_from_topology: parse net charge from .top/.itp - update_topology_molecules: append water/ion entries to [ molecules ] - validate_with_grompp: dry-run topology validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brief NPT equilibration with protein heavy atoms position-restrained via CustomExternalForce. Allows water/ions to relax around the fixed protein structure before GROMACS production runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build pipeline: clean PDB -> pdb2gmx -> center in cubic box -> solvate (reuses existing solvate()) -> ionize (reuses ionize_solvated_system()) -> update topology -> OpenMM relax with protein restraints -> validate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MDP files for protein-in-waterbox equilibration and production: - em.mdp: steepest descent minimization - nvt.mdp: NVT with -DPOSRES and Protein/Non-Protein tc-grps - npt.mdp: NPT with -DPOSRES and Berendsen barostat - md.mdp: production with Parrinello-Rahman, no position restraints All use CHARMM36m-specific nonbonded settings (1.2nm cutoffs, Force-switch VdW modifier, no dispersion correction). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 13 unit tests covering ProteinSpecies, ProteinBoxComposition, Pdb2gmxConfig, BuildInput integration, and topology parsing - Fix ProteinSpecies to default count=1/fraction=1.0 at field level (avoids parent Species validator ordering issue) - Example YAML for lysozyme with CHARMM36m force field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses 4 P2 review comments: - Add missing `from pathlib import Path` to build.py module scope - Resolve all paths to absolute in run_pdb2gmx before subprocess call (avoids cwd confusion with output_dir) - Add species/total_count/charge properties to ProteinBoxComposition and proteinbox case in BuildInput.metadata for analysis compatibility - Implement _apply_protonation_states: renames residues in PDB before pdb2gmx so protonation overrides (HIS->HIE, GLU->GLH) are applied Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add check_forcefield_available() that verifies the force field directory exists in GROMACS share, GMXLIB, or cwd before invoking pdb2gmx. Lists available force fields on failure. - Clean up mdout.mdp artifact in validate_with_grompp alongside check.tpr. - Fix bare open() file handles in relax_with_protein_restraints (use with-blocks). - Add CHARMM HIS alias translation (HIE→HSE, HID→HSD, HIP→HSP for charmm FFs). - Use re.fullmatch for protonation state key parsing (reject trailing chars). - Validate 3-character residue names for PDB column safety. - Add disulfide bond prompt generation (_build_disulfide_prompt_input) for deterministic pdb2gmx -ss interaction. - Resolve paths before working_directory context switch to avoid cwd breakage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FORCEFIELD_REGISTRY mapping friendly names (charmm36m, charmm36m-ljpme) to MacKerell lab download URLs and extracted directory names. - Auto-download missing force fields on first use from the registry. Stored in platformdirs user_data_dir (~/Library/Application Support/mdfactory/forcefields/). - resolve_forcefield() translates friendly names to actual directory stems so users write "charmm36m" in YAML and pdb2gmx receives the correct directory name. - Inject GMXLIB in subprocess env so gmx finds downloaded force fields. - Fix extract_charge_from_topology: skip .ff/ library includes (ions.itp, tip3p.itp) that contain atom type templates, not system charges. Was causing +159 charge for lysozyme with charmm36m instead of +8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add [gromacs] section to settings.py with GMX_PATH and FORCEFIELD_DIR, following the same pattern as [cgenff] SILCSBIODIR. - Settings.__init__ auto-prepends FORCEFIELD_DIR to GMXLIB env var on startup, matching how SILCSBIODIR is auto-set for CGenFF. - check_gmx_available() checks configured GMX_PATH first, falls back to PATH lookup. - All gmx subprocess calls (pdb2gmx, grompp) use the configured binary and inject GMXLIB for force field resolution. - Add GROMACS setup to config wizard (sync_config.py): prompts for gmx path, forcefield dir, and offers to download CHARMM36m on setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
maxscheurer
requested changes
Jun 1, 2026
| - [76, 94] | ||
| protonation_states: | ||
| HIS15: HIE | ||
| box_padding: 12.0 |
Collaborator
There was a problem hiding this comment.
Consistent naming, as in other system types?
| - [64, 80] | ||
| - [76, 94] | ||
| protonation_states: | ||
| HIS15: HIE |
Collaborator
There was a problem hiding this comment.
Should be a list of key-value pairs, right?
| type: pdb2gmx | ||
| forcefield: charmm36m | ||
| water_model: tip3p | ||
| ignh: true |
Collaborator
There was a problem hiding this comment.
Bad key name, have no idea what that means.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tests
Closes #13