RFC: Curated collections (named ID sets) for search

# RFC: Curated collections (named ID sets) for search

## Summary

Add a small **curated-collections** layer to the SDK: named, sourced sets of
Punk ids (for example `burned`, `museum`) that resolve in search and through a
lookup API. A query like `punks.search("burned punks")` would return the
12 burned Punks; `punks.collections.get("museum")` would return the
institution-held set with its provenance metadata.

This is distinct from `search-synonyms.json`, which is a trait-phrase
rewriter and structurally cannot express id sets (see below). I have the
underlying data already curated and would contribute it plus the integration.

## Why not `search-synonyms.json`

I traced the parser. Synonym values are tokenized into free-text *trait*
terms and expanded in `text-parse.ts` (`expandSearchSynonymTerms`) **after**
the id include/exclude pass. So a value like `"685 2317 2761 …"` becomes free
terms that resolve against trait names, never include-ids. Every existing
entry confirms the intended scope, e.g. `"marilyn"` → `female "blonde bob"
"hot lipstick"`. Burned/museum/lost are not traits, so the synonyms file is
the wrong home, and widening it to accept ids would overload a single-purpose
mechanism.

## What already exists to build on

`PunksSearchQuery` already supports explicit `ids` and `excludeIds`
(`sdk/src/types.ts`), and the text parser already accepts `#1001` / bare
`1001` / `-1001`. The only missing piece is a **named layer** mapping a slug
to an id set with metadata. There is no tag/collection concept in the dataset
today; this proposal adds the smallest one that reuses the existing `ids`
query path.

## Proposed data shape

A new bundled JSON, e.g. `sdk/src/search-collections.json`:

```jsonc
{
  "burned": {
    "title": "Burned Punks",
    "description": "Punks provably sent to a burn address or otherwise removed from circulation.",
    "aliases": ["burned punks", "destroyed punks"],
    "source": "https://burnedpunks.com",
    "standard": "v2",
    "ids": [685, 2317, 2761, 2838, 3493, 3808, 5041, 5237, 5449, 7755, 8611, 9146]
  },
  "museum": {
    "title": "Museum Punks",
    "description": "Punks held in the permanent collections of art institutions.",
    "aliases": ["museum punks", "institution punks"],
    "source": "https://museumpunks.com",
    "standard": "v2",
    "ids": [74, 110, 305, 1286, 2554, 2786, 2838, 3407, 3831, 4018, 5160, 5449, 5616, 7178, 7899, 9833]
  }
}
```

Optional per-id provenance (institution, acquisition type, announcement URL,
V1 status) can either live in this file or stay app-side. I would defer to
maintainer preference on how much metadata belongs in the SDK bundle versus a
thinner ids-only form.

## Integration

1. **Search resolution.** When a free phrase matches a collection alias,
   resolve it to that collection's `ids` via the existing
   `PunksSearchQuery.ids` path, rather than to trait terms. This runs as a
   separate resolver from synonyms so the two never collide.
2. **Lookup API.** `punks.collections.list()` / `punks.collections.get(slug)`
   returning `{ title, description, source, standard, ids }` for UI use.

## Seed sets I can contribute

| Slug | Size | Basis | Notes |
| ---- | ---- | ----- | ----- |
| `burned` | 12 | On-chain, objective | Strongest first candidate. Each id verifiable by its transfer to a burn destination. |
| `museum` | 16 across 6 institutions (MoMA, LACMA, Centre Pompidou, ZKM Karlsruhe, ICA Miami, Toledo Museum of Art) | Sourced provenance | Every entry has an acquisition type and a public announcement URL. |

I would start with `burned` only if you prefer to keep the first PR objective
and minimal, then follow with `museum`.

## V1 / V2 correctness

Each collection carries a `standard` field. Burns and institutional holdings
must be attributed to the right contract: my source data already separates V1
and V2 holders, which lines up with the indexer's `punks` / `v1_punks` split.
A single Punk can also legitimately appear in more than one set with different
context. Two of the museum Punks are also burned: **#2838** and **#5449**
(both ZKM acquisitions sent to the `CryptoPunksMarket` contract
`0xb47e…BBB`). A flat alias cannot represent that overlap; metadata-carrying
collections can.

## Open questions for maintainers

1. Is a curated-collections layer something you want in the SDK at all, or
   would you rather it live app-side (e.g. punksmarket.app) and keep the SDK
   purely mechanical?
2. If in-SDK: how much per-id provenance metadata belongs in the bundle versus
   ids-only with metadata fetched elsewhere?
3. Should collection matching be on by default in `punks.search`, or opt-in
   via a query flag, to avoid surprising free-text matches?
4. Preferred home and naming: `search-collections.json` next to
   `search-synonyms.json`, or a `collections/` data module?

If the direction is welcome, I will open a PR with the `burned` set, the data
file, the search resolver, and tests, scoped behind a changeset.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Curated collections (named ID sets) for search #6

RFC: Curated collections (named ID sets) for search

Summary

Why not `search-synonyms.json`

What already exists to build on

Proposed data shape

Integration

Seed sets I can contribute

V1 / V2 correctness

Open questions for maintainers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Slug	Size	Basis	Notes
`burned`	12	On-chain, objective	Strongest first candidate. Each id verifiable by its transfer to a burn destination.
`museum`	16 across 6 institutions (MoMA, LACMA, Centre Pompidou, ZKM Karlsruhe, ICA Miami, Toledo Museum of Art)	Sourced provenance	Every entry has an acquisition type and a public announcement URL.

RFC: Curated collections (named ID sets) for search #6

Description

RFC: Curated collections (named ID sets) for search

Summary

Why not search-synonyms.json

What already exists to build on

Proposed data shape

Integration

Seed sets I can contribute

V1 / V2 correctness

Open questions for maintainers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why not `search-synonyms.json`