Skip to content

RFC: Curated collections (named ID sets) for search #6

@seanbonner

Description

@seanbonner

RFC: Curated collections (named ID sets) for search

Summary

Add a small curated-collections layer to the SDK: named, sourced sets of
Punk ids (for example burned, museum) that resolve in search and through a
lookup API. A query like punks.search("burned punks") would return the
12 burned Punks; punks.collections.get("museum") would return the
institution-held set with its provenance metadata.

This is distinct from search-synonyms.json, which is a trait-phrase
rewriter and structurally cannot express id sets (see below). I have the
underlying data already curated and would contribute it plus the integration.

Why not search-synonyms.json

I traced the parser. Synonym values are tokenized into free-text trait
terms and expanded in text-parse.ts (expandSearchSynonymTerms) after
the id include/exclude pass. So a value like "685 2317 2761 …" becomes free
terms that resolve against trait names, never include-ids. Every existing
entry confirms the intended scope, e.g. "marilyn"female "blonde bob" "hot lipstick". Burned/museum/lost are not traits, so the synonyms file is
the wrong home, and widening it to accept ids would overload a single-purpose
mechanism.

What already exists to build on

PunksSearchQuery already supports explicit ids and excludeIds
(sdk/src/types.ts), and the text parser already accepts #1001 / bare
1001 / -1001. The only missing piece is a named layer mapping a slug
to an id set with metadata. There is no tag/collection concept in the dataset
today; this proposal adds the smallest one that reuses the existing ids
query path.

Proposed data shape

A new bundled JSON, e.g. sdk/src/search-collections.json:

{
  "burned": {
    "title": "Burned Punks",
    "description": "Punks provably sent to a burn address or otherwise removed from circulation.",
    "aliases": ["burned punks", "destroyed punks"],
    "source": "https://burnedpunks.com",
    "standard": "v2",
    "ids": [685, 2317, 2761, 2838, 3493, 3808, 5041, 5237, 5449, 7755, 8611, 9146]
  },
  "museum": {
    "title": "Museum Punks",
    "description": "Punks held in the permanent collections of art institutions.",
    "aliases": ["museum punks", "institution punks"],
    "source": "https://museumpunks.com",
    "standard": "v2",
    "ids": [74, 110, 305, 1286, 2554, 2786, 2838, 3407, 3831, 4018, 5160, 5449, 5616, 7178, 7899, 9833]
  }
}

Optional per-id provenance (institution, acquisition type, announcement URL,
V1 status) can either live in this file or stay app-side. I would defer to
maintainer preference on how much metadata belongs in the SDK bundle versus a
thinner ids-only form.

Integration

  1. Search resolution. When a free phrase matches a collection alias,
    resolve it to that collection's ids via the existing
    PunksSearchQuery.ids path, rather than to trait terms. This runs as a
    separate resolver from synonyms so the two never collide.
  2. Lookup API. punks.collections.list() / punks.collections.get(slug)
    returning { title, description, source, standard, ids } for UI use.

Seed sets I can contribute

Slug Size Basis Notes
burned 12 On-chain, objective Strongest first candidate. Each id verifiable by its transfer to a burn destination.
museum 16 across 6 institutions (MoMA, LACMA, Centre Pompidou, ZKM Karlsruhe, ICA Miami, Toledo Museum of Art) Sourced provenance Every entry has an acquisition type and a public announcement URL.

I would start with burned only if you prefer to keep the first PR objective
and minimal, then follow with museum.

V1 / V2 correctness

Each collection carries a standard field. Burns and institutional holdings
must be attributed to the right contract: my source data already separates V1
and V2 holders, which lines up with the indexer's punks / v1_punks split.
A single Punk can also legitimately appear in more than one set with different
context. Two of the museum Punks are also burned: #2838 and #5449
(both ZKM acquisitions sent to the CryptoPunksMarket contract
0xb47e…BBB). A flat alias cannot represent that overlap; metadata-carrying
collections can.

Open questions for maintainers

  1. Is a curated-collections layer something you want in the SDK at all, or
    would you rather it live app-side (e.g. punksmarket.app) and keep the SDK
    purely mechanical?
  2. If in-SDK: how much per-id provenance metadata belongs in the bundle versus
    ids-only with metadata fetched elsewhere?
  3. Should collection matching be on by default in punks.search, or opt-in
    via a query flag, to avoid surprising free-text matches?
  4. Preferred home and naming: search-collections.json next to
    search-synonyms.json, or a collections/ data module?

If the direction is welcome, I will open a PR with the burned set, the data
file, the search resolver, and tests, scoped behind a changeset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions