RFC: Curated collections (named ID sets) for search
Summary
Add a small curated-collections layer to the SDK: named, sourced sets of
Punk ids (for example burned, museum) that resolve in search and through a
lookup API. A query like punks.search("burned punks") would return the
12 burned Punks; punks.collections.get("museum") would return the
institution-held set with its provenance metadata.
This is distinct from search-synonyms.json, which is a trait-phrase
rewriter and structurally cannot express id sets (see below). I have the
underlying data already curated and would contribute it plus the integration.
Why not search-synonyms.json
I traced the parser. Synonym values are tokenized into free-text trait
terms and expanded in text-parse.ts (expandSearchSynonymTerms) after
the id include/exclude pass. So a value like "685 2317 2761 …" becomes free
terms that resolve against trait names, never include-ids. Every existing
entry confirms the intended scope, e.g. "marilyn" → female "blonde bob" "hot lipstick". Burned/museum/lost are not traits, so the synonyms file is
the wrong home, and widening it to accept ids would overload a single-purpose
mechanism.
What already exists to build on
PunksSearchQuery already supports explicit ids and excludeIds
(sdk/src/types.ts), and the text parser already accepts #1001 / bare
1001 / -1001. The only missing piece is a named layer mapping a slug
to an id set with metadata. There is no tag/collection concept in the dataset
today; this proposal adds the smallest one that reuses the existing ids
query path.
Proposed data shape
A new bundled JSON, e.g. sdk/src/search-collections.json:
Optional per-id provenance (institution, acquisition type, announcement URL,
V1 status) can either live in this file or stay app-side. I would defer to
maintainer preference on how much metadata belongs in the SDK bundle versus a
thinner ids-only form.
Integration
- Search resolution. When a free phrase matches a collection alias,
resolve it to that collection's ids via the existing
PunksSearchQuery.ids path, rather than to trait terms. This runs as a
separate resolver from synonyms so the two never collide.
- Lookup API.
punks.collections.list() / punks.collections.get(slug)
returning { title, description, source, standard, ids } for UI use.
Seed sets I can contribute
| Slug |
Size |
Basis |
Notes |
burned |
12 |
On-chain, objective |
Strongest first candidate. Each id verifiable by its transfer to a burn destination. |
museum |
16 across 6 institutions (MoMA, LACMA, Centre Pompidou, ZKM Karlsruhe, ICA Miami, Toledo Museum of Art) |
Sourced provenance |
Every entry has an acquisition type and a public announcement URL. |
I would start with burned only if you prefer to keep the first PR objective
and minimal, then follow with museum.
V1 / V2 correctness
Each collection carries a standard field. Burns and institutional holdings
must be attributed to the right contract: my source data already separates V1
and V2 holders, which lines up with the indexer's punks / v1_punks split.
A single Punk can also legitimately appear in more than one set with different
context. Two of the museum Punks are also burned: #2838 and #5449
(both ZKM acquisitions sent to the CryptoPunksMarket contract
0xb47e…BBB). A flat alias cannot represent that overlap; metadata-carrying
collections can.
Open questions for maintainers
- Is a curated-collections layer something you want in the SDK at all, or
would you rather it live app-side (e.g. punksmarket.app) and keep the SDK
purely mechanical?
- If in-SDK: how much per-id provenance metadata belongs in the bundle versus
ids-only with metadata fetched elsewhere?
- Should collection matching be on by default in
punks.search, or opt-in
via a query flag, to avoid surprising free-text matches?
- Preferred home and naming:
search-collections.json next to
search-synonyms.json, or a collections/ data module?
If the direction is welcome, I will open a PR with the burned set, the data
file, the search resolver, and tests, scoped behind a changeset.
RFC: Curated collections (named ID sets) for search
Summary
Add a small curated-collections layer to the SDK: named, sourced sets of
Punk ids (for example
burned,museum) that resolve in search and through alookup API. A query like
punks.search("burned punks")would return the12 burned Punks;
punks.collections.get("museum")would return theinstitution-held set with its provenance metadata.
This is distinct from
search-synonyms.json, which is a trait-phraserewriter and structurally cannot express id sets (see below). I have the
underlying data already curated and would contribute it plus the integration.
Why not
search-synonyms.jsonI traced the parser. Synonym values are tokenized into free-text trait
terms and expanded in
text-parse.ts(expandSearchSynonymTerms) afterthe id include/exclude pass. So a value like
"685 2317 2761 …"becomes freeterms that resolve against trait names, never include-ids. Every existing
entry confirms the intended scope, e.g.
"marilyn"→female "blonde bob" "hot lipstick". Burned/museum/lost are not traits, so the synonyms file isthe wrong home, and widening it to accept ids would overload a single-purpose
mechanism.
What already exists to build on
PunksSearchQueryalready supports explicitidsandexcludeIds(
sdk/src/types.ts), and the text parser already accepts#1001/ bare1001/-1001. The only missing piece is a named layer mapping a slugto an id set with metadata. There is no tag/collection concept in the dataset
today; this proposal adds the smallest one that reuses the existing
idsquery path.
Proposed data shape
A new bundled JSON, e.g.
sdk/src/search-collections.json:{ "burned": { "title": "Burned Punks", "description": "Punks provably sent to a burn address or otherwise removed from circulation.", "aliases": ["burned punks", "destroyed punks"], "source": "https://burnedpunks.com", "standard": "v2", "ids": [685, 2317, 2761, 2838, 3493, 3808, 5041, 5237, 5449, 7755, 8611, 9146] }, "museum": { "title": "Museum Punks", "description": "Punks held in the permanent collections of art institutions.", "aliases": ["museum punks", "institution punks"], "source": "https://museumpunks.com", "standard": "v2", "ids": [74, 110, 305, 1286, 2554, 2786, 2838, 3407, 3831, 4018, 5160, 5449, 5616, 7178, 7899, 9833] } }Optional per-id provenance (institution, acquisition type, announcement URL,
V1 status) can either live in this file or stay app-side. I would defer to
maintainer preference on how much metadata belongs in the SDK bundle versus a
thinner ids-only form.
Integration
resolve it to that collection's
idsvia the existingPunksSearchQuery.idspath, rather than to trait terms. This runs as aseparate resolver from synonyms so the two never collide.
punks.collections.list()/punks.collections.get(slug)returning
{ title, description, source, standard, ids }for UI use.Seed sets I can contribute
burnedmuseumI would start with
burnedonly if you prefer to keep the first PR objectiveand minimal, then follow with
museum.V1 / V2 correctness
Each collection carries a
standardfield. Burns and institutional holdingsmust be attributed to the right contract: my source data already separates V1
and V2 holders, which lines up with the indexer's
punks/v1_punkssplit.A single Punk can also legitimately appear in more than one set with different
context. Two of the museum Punks are also burned: #2838 and #5449
(both ZKM acquisitions sent to the
CryptoPunksMarketcontract0xb47e…BBB). A flat alias cannot represent that overlap; metadata-carryingcollections can.
Open questions for maintainers
would you rather it live app-side (e.g. punksmarket.app) and keep the SDK
purely mechanical?
ids-only with metadata fetched elsewhere?
punks.search, or opt-invia a query flag, to avoid surprising free-text matches?
search-collections.jsonnext tosearch-synonyms.json, or acollections/data module?If the direction is welcome, I will open a PR with the
burnedset, the datafile, the search resolver, and tests, scoped behind a changeset.