Skip to content

The performance is very unefficient #3

Description

@ynicle

By running the sample code , 4096 context + 768 steps, it costs 8min for one question on H20 gpu and about 50G GRAM is occupied.

  • 4096 context + 768 steps: 8min + 50G GRAM
  • 2048 context + 768 steps: 4min + 31G GRAM
  • 768 context + 768 steps: 2min + 20G GRAM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions