Skip to content

feat: add utility function (and/or data) for URL datasets if necessary #20

@tianjianjiang

Description

@tianjianjiang

A Light Discussion about Dataset Choices for URL (at least)

Besides a small subset of (m)C4, I prefer finding intersections among metadata (URL at least), promptsource, and evaluation WGs.

  • TyDi QA (primary task) is probably the only common dataset

For either one of two WGs excluding us metadata here,

  • From evaluation
    • GEM from eval WG, specifically
      • MLSum
      • WikiLingua
  • From promptsource
    • app_reviews: although not really URL/URI but basically namespace and date
    • CC-News: virtually a subset of C4
    • Probably some more

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestwontfixThis will not be worked on

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions