`DataSet.iterate_items(processor=...)` possibly confusing and has two unrelated concerns

The `processor` parameter on `DataSet.iterate_items` / `DataSet._iterate_items` is used for two unrelated things. Some callers don't pass it and silently lose features; one caller passes a class instead of an instance which does nothing silently.

### What `processor=` actually does (because I was confused at first)

1. **Interrupt source** — checked via `hasattr(processor, "interrupted") and processor.interrupted` so iteration aborts cleanly when the calling processor is interrupted. (We often do this redundantly in `process` calls, but this is a nice catch.)
2. **Log-routing target** — used downstream by `warn_unmappable_item` (where I discovered this issue) to route per-item map-failure entries to `processor.dataset` (the *processing* dataset's log) rather than the source dataset. I think that's still nice to have (running a processor on a DataSet but it has to skip unmapped items should be in that new DataSet's log).

The name also collides with `own_processor`, which `iterate_items` uses internally for map_item resolution (what I originally thought it was for). Two unrelated "processors" in one method.

### Concrete failure modes

- **Caller passes nothing →** silent degradation: no interrupt support, per-item warnings land in the source dataset's log instead of the caller's output log. Quite a few examples: `tf_idf.py`, `vectorise.py`, `topic_modeling.py`, `generate_embeddings.py` and others. Some of these don't have the `self.interrupted` check in the loop either.
- **Caller passes a class →** silent no-op: `webtool/views/views_dataset.py` pass `dataset.get_own_processor()`, which returns a class, not an instance. Probably me or someone like me thought it controlled map_item resolution.
- **Most processors pass `self`** and work correctly (both attributes happen to satisfy both uses). Yay.

### What do we do?

**Cleanup** I'll do a mini audit and fix callers: pass `self` from every processor; omit from `views_dataset` callers (even though it does nothing, it is... wrong? or just confusing). And I could add a runtime check that warns if a class is passed. 

#### But what's better?
**1. Split into two parameters.** Maybe a `interrupt_source=` and `log_target=`. It's clearer, though we lose the easy `self.source_dataset.iterate_items(self)` convenience. 

**2. Create an iteration wrapper in BasicProcessor.** Add `BasicProcessor.iterate(self, dataset, **kwargs)`; processors call `self.iterate(self.source_dataset)`. `DataSet.iterate_items` drops the `processor` parameter entirely — pure generator for non-processor callers. Clean, but rewrites every processor call site. For running processors, passing `processor` should not be options (but it has to be since the DataSet iterate_items is used elsewhere directly).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DataSet.iterate_items(processor=...)` possibly confusing and has two unrelated concerns #595

What `processor=` actually does (because I was confused at first)

Concrete failure modes

What do we do?

But what's better?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DataSet.iterate_items(processor=...) possibly confusing and has two unrelated concerns #595

Description

What processor= actually does (because I was confused at first)

Concrete failure modes

What do we do?

But what's better?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`DataSet.iterate_items(processor=...)` possibly confusing and has two unrelated concerns #595

What `processor=` actually does (because I was confused at first)