Skip to content

Fix HPO parsing for OBO 1.4 - round 2#289

Merged
EddieLF merged 1 commit into
stagingfrom
fix_hpo_parsing2
Jun 15, 2026
Merged

Fix HPO parsing for OBO 1.4 - round 2#289
EddieLF merged 1 commit into
stagingfrom
fix_hpo_parsing2

Conversation

@EddieLF

@EddieLF EddieLF commented Jun 15, 2026

Copy link
Copy Markdown

Follow up to #288

The problem

The regex assumed {modifier} comes after ! comment, but OBO 1.4 spec puts the modifier before the comment, and the real HPO file uses the standard ordering. With a line like is_a: HP:0032162 {xref="PMID:31677808"} ! Phenotypic abnormality, the anchored-to-end regex didn't match, split(' ! ')[0] returned HP:0032162 {xref="..."}, and that string was stored as the parent_id, leading to errors.

The fix

  • re.search(r'HP:\d{7}', value) extracts the parent id directly, regardless of where (or whether) modifiers/comments appear around it.
  • Raises with the offending line if no HPO id is found, so a malformed is_a: line surfaces a clear error instead of silently storing garbage as parent_id (which is what bit us in the last PR).
  • update_hpo_tests.py — added a new fixture term HP:9999003 using the OBO 1.4 standard ordering ({modifier} ! comment) so this case is covered going forward; bumped record counts to 8.

@MattWellie MattWellie left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, so it was just bad luck that the original failure we spotted was ordered a certain way?

@EddieLF EddieLF merged commit 3f68e73 into staging Jun 15, 2026
6 checks passed
@EddieLF EddieLF deleted the fix_hpo_parsing2 branch June 15, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants