Skip to content

Discrepancy at position_in_codon within upstream/downstream regions #3

@XLIU-hub

Description

@XLIU-hub

How to determine position_in_codon in upstream and downstream regions.

To retain continuous phase for the position_in_codon between upstream (u)-5′ UTR(-) or 3' UTR''(*)-downstream(d) boundary. We proposed a calculation method on position_in_codon, which is to extend the CDS reading frame upstream of the start codon and assign codon positions (value 1/2/3) to all bases in region u or - relative to the CDS frame. Under this convention, if the first base of region - corresponds to the second position in the codon, then the immediately upstream base is assigned codon position as 1 of the same projected codon. But this could cause discrepancy in u or d regions.

Example
If we have a coding sequence with the exons and CDS coordinate as

_exons = [(5, 8), (14, 20), (30, 35), (40, 44), (50, 52), (70, 72)]
_cds = (32, 43)

Following the above calculation method: crossmapper would convert coordinates from 2 to 5 into protein positions as below:

{'position': 1, 'position_in_codon': 2, 'region': 'u', 'offset': 0} # coordinate 2
{'position': 1, 'position_in_codon': 3, 'region': 'u', 'offset': 0} # coordinate 3
{'position': 1, 'position_in_codon': 1, 'region': 'u', 'offset': 0} # coordinate 4
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': 0} # coordinate 5

In HGVS format, p.u1.1 (coordinate 4) looks adjacent to p.u1.2 (coordinate 2), but they are not on the coordinate.

Goal of the protein positions from crossmapper

  • can convert from coordinate to positions and vice versa.
  • deliver meaningful results
  • no discrepancy between coordinate or position_in_codon

Possible solutions

  • Add warning message if the region in positions are not in CDS.
  • Extend from - region and add count in offset
{'position': 4, 'position_in_codon': 2, 'region': '-', 'offset': -1} # coordinate 4
  • Extend from "" region and add count in offset
{'position': 1, 'position_in_codon': 1, 'region': '', 'offset': -28} # coordinate 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions