Skip to content

Fix backbone dihedral feature calculation#11

Open
Zhangke123jimu wants to merge 1 commit into
peizhenbai:mainfrom
Zhangke123jimu:fix/backbone-dihedral-features
Open

Fix backbone dihedral feature calculation#11
Zhangke123jimu wants to merge 1 commit into
peizhenbai:mainfrom
Zhangke123jimu:fix/backbone-dihedral-features

Conversation

@Zhangke123jimu
Copy link
Copy Markdown

Summary

Thank you for releasing the MapDiff implementation. While studying and reproducing the model, I noticed several issues in the backbone dihedral feature calculation in get_node_features() in dataloader/cath_dataset.py.

This PR proposes a small geometry-correctness fix for the node dihedral features. The main changes are:

1.Convert dihedral angles from degrees to radians before applying np.sin and np.cos.
2.Correct the definition of the phi angle for residue i to $C_{i-1}-N_i-CA_i-C_i$.
3.Minor: Avoid computing peptide-bond-related dihedral features across likely chain breaks or non-connected adjacent residues by checking the $C_i-N_{i+1}$ distance.

Motivation

In the original implementation, the dihedral angles returned by dihedral() appear to be measured in degrees, but they are directly passed to np.sin and np.cos. Since these functions expect radians, this may result in a distorted geometric encoding.

In addition, the original phi-angle calculation uses:

dihedral(c_coords[i], n_coords[i], c_alpha_coords[i], n_coords[i + 1])

whereas the standard backbone phi angle for residue i is defined by: $C_{i-1}-N_i-CA_i-C_i$.

Finally, dihedral features involving adjacent residues are meaningful only when the corresponding residues are connected by a peptide bond. To make the feature calculation more robust to chain breaks or missing residues, this PR checks peptide-bond connectivity using the $C_i-N_{i+1}$ distance. Undefined torsions at termini or chain breaks are encoded as (sin, cos) = (0, 0).

Validation

I compared the original implementation and the geometry-fixed version under the same setting:

Dataset: CATH 4.2
Prior: marginal prior
Hardware: 2 × A100 GPUs
Per-GPU batch size: 4
Other settings: unchanged from the original configuration

Results:

Version Recovery Perplexity
Original reproduction 60.97% 3.54
Geometry-fixed version 61.23% 3.54
Reported MapDiff result 60.93% 3.43

The improvement is small, so I would not interpret this as a major performance-oriented modification. The main motivation of this PR is to make the geometric node features biologically more consistent and avoid potentially misleading dihedral encodings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant