Skip to content

Add metadata column to distinguish patent sequences (e.g. include Genbank division) #586

@corneliusroemer

Description

@corneliusroemer

Right now, NCBI datasets downloads for viruses often include sequences of non-natural origin, predominantly patent related sequences.

It would be great if the metadata fields included one that would allow to easily filter those sequences out. For example, you could include the Genbank division the sequence appears in. There is a dedicated patent division PAT: https://www.ncbi.nlm.nih.gov/education/patent_and_ip_faqs/

Unfortunately, it seems this information is lost in the datasets input pipeline. Would be great if it could be kept and surfaced.

The feature would be immediately and immensely useful to Pathoplexus as it would allow us to not ingest patent sequences - those are out of scope for Pathoplexus as they are not useful in pathogen genomic analyses. See loculus-project/loculus#6450

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions