Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,70 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.5.0] - 2026-05-12

### Added

#### `bead.config.compose` — didactic-grounded config composer

- New subpackage `bead.config.compose` replaces the hand-rolled config
loader. Generic over any `dx.Model` schema; supports the full
OmegaConf interpolation grammar (`${section.field}`, `${.x}` /
`${..x}` relative, `${a.b[0]}` and `${a.b.0}` list indexing,
`${a.${b}}` nested, `\${literal}` escape, cycle detection).
- Built-in resolvers: `oc.env`, `oc.env:VAR,default`, `oc.select`,
`oc.decode` (base64), `oc.deprecated`, `oc.create`,
`oc.dict.keys`, `oc.dict.values`. Application-specific resolvers
register via `bead.config.compose.register_resolver`.
- Bead-specific resolvers in `bead.config.resolvers`:
`${bead.path:rel}` joins against the active root's
`paths.data_dir`; `${bead.anchor:name[,attr]}` post-validation
expansion.
- `defaults: [...]` composition at the top of any YAML/TOML config
composes referenced files left-to-right before the primary body.
- Strict-merge rejects unknown keys with the dotted path to the
offending site, walking nested `dx.Embed[T]` models from
`__field_specs__`.
- TOML configs (`.toml`) supported alongside YAML out of the box.
- `bead.config.load_config` is now a thin wrapper around
`compose(schema=BeadConfig, ...)`. The previous
`load_yaml_file` / `merge_configs` helpers are removed.
- CLI: every `bead ...` invocation accepts repeatable
`--set KEY=VALUE` overrides threaded into the compose pipeline.

#### `ScaleType.FORCED_CHOICE`

- New `ScaleType.FORCED_CHOICE` variant covers N-alternative
forced-choice tasks where per-item options vary across items
(response space is a fixed positional label set, e.g.
`("first", "second")`, but each `Item` carries its own
alternatives). `family_to_item_template` and the
active-learning model registry route forced-choice anchors to
`ForcedChoiceModel`.
- `AnchorSpec.scale_type` is an optional explicit override so config
files declare the task type alongside the response space.

#### Gallery: `gallery/eng/argument_structure/` v0.4.0 wiring

- New `protocol.py` module exposes `build_protocol()` /
`acceptability_family()` / `acceptability_anchor()`. The 2AFC
acceptability question is declared once in `config.yaml` under
`protocol:` and consumed by every script.
- `generate_deployment.py` and `simulate_pipeline.py` build their
`ItemTemplate` via `family_to_item_template` instead of literal
prompt strings.
- `create_2afc_pairs.py` threads the protocol anchor name
(`"acceptability"`) into every pair's `item_metadata` so the
JATOS-result → `AnnotationRecord` bridge can match responses
back to the canonical anchor.
- `make validate-protocol` builds the live `AnnotationProtocol`
from `config.yaml` and prints the family, prompt, and scale
type. Wired in as a prerequisite to `make data`.
- `tests/test_protocol.py` covers the config-to-protocol round
trip, the forced-choice scale type, the `family_to_item_template`
prompt agreement, and the active-learning model selection for
the resulting encoding.

## [0.4.0] - 2026-05-07

### Added
Expand Down
119 changes: 80 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# bead

[![CI](https://github.com/FACTSlab/bead/actions/workflows/ci.yml/badge.svg)](https://github.com/FACTSlab/bead/actions/workflows/ci.yml)
[![Python 3.13](https://img.shields.io/badge/python-3.13-blue.svg)](https://www.python.org/downloads/)
[![Python 3.14](https://img.shields.io/badge/python-3.14-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Documentation](https://img.shields.io/badge/docs-readthedocs-blue.svg)](https://bead.readthedocs.io)
[![Documentation](https://img.shields.io/badge/docs-readthedocs-blue.svg)](https://factslab.io/bead/)

A Python framework for constructing, deploying, and analyzing large-scale linguistic judgment experiments with active learning.

Expand Down Expand Up @@ -41,37 +41,69 @@ Always use `uv run` to execute commands.
## Quick Start

```python
from bead.resources import LexicalItem, Template, Lexicon
from bead.templates import TemplateFiller
from bead.items import ItemConstructor
from bead.lists import ListPartitioner

# 1. Define resources
verbs = Lexicon(items=[
LexicalItem(lemma="walk", pos="VERB", features={"transitive": False}),
LexicalItem(lemma="eat", pos="VERB", features={"transitive": True}),
])

template = Template(
text="The person {verb} the thing",
slots=["verb"],
language_code="en"
from bead.items.forced_choice import create_forced_choice_item
from bead.lists.partitioner import ListPartitioner
from bead.protocol import (
AnnotationProtocol,
QuestionFamily,
ResponseSpace,
ScaleType,
SemanticAnchor,
)
from bead.protocol.items import family_to_item_template

# 1. Declare the question being asked
anchor = SemanticAnchor(
name="acceptability",
target_property="acceptability",
canonical_prompt="Which sentence sounds more natural?",
response_space=ResponseSpace(
options=("first", "second"),
is_ordered=False,
scale_type=ScaleType.FORCED_CHOICE,
),
required_keywords=frozenset({"natural"}),
)
protocol = AnnotationProtocol(families=[QuestionFamily(anchor=anchor)])

# 2. Build the deployable item template from the protocol
template = family_to_item_template(
protocol.family_by_name("acceptability"),
judgment_type="acceptability",
)

# 2. Fill templates
filler = TemplateFiller(strategy="exhaustive")
filled = filler.fill(templates=[template], lexicons={"verbs": verbs})
# 3. Build forced-choice items (one per minimal pair)
items = [
create_forced_choice_item(
"The cat sat on the mat.",
"The cats sat on the mat.",
item_template_id=template.id,
metadata={"anchor": "acceptability", "contrast": "number"},
),
# ... more pairs
]

# 4. Partition into experiment lists
partitioner = ListPartitioner(random_seed=42)
lists = partitioner.partition(
[item.id for item in items],
n_lists=4,
metadata={item.id: dict(item.item_metadata) for item in items},
)
```

# 3. Construct items
constructor = ItemConstructor(models=["gpt2"])
items = constructor.construct_forced_choice_items(filled, n_alternatives=2)
Or, drive the same pipeline from a single declarative config:

# 4. Partition into lists
partitioner = ListPartitioner()
lists = partitioner.partition(items.get_uuids(), n_lists=4)
```python
from bead.config import load_config

# 5. Deploy
lists.save("lists/experiment.jsonl")
# Composes profile defaults → defaults: [...] entries → primary YAML
# → extras → CLI-style overrides → resolves ${...} interpolation
config = load_config(
"config.yaml",
overrides=["paths.data_dir=/tmp/data"],
)
protocol = config.protocol.build()
```

## Pipeline Stages
Expand All @@ -93,27 +125,36 @@ lists.save("lists/experiment.jsonl")
- **Model integration**: HuggingFace, OpenAI, Anthropic with caching
- **Active learning**: uncertainty sampling with convergence detection
- **Annotation protocols**: type-theoretic stack of `SemanticAnchor` (the question type), `ProtocolContext` (the dependent index), `RealizationStrategy` (template / contextual / LM phrasings), and `DriftGuard` (the type-checker over realized prompts), composed into conditional `AnnotationProtocol`s
- **Config composer** (`bead.config.compose`): the full OmegaConf interpolation grammar — `${section.field}`, `${.x}` / `${..y}` relative references, `${a.b[0]}` / `${a.b.0}` list indexing, `${a.${b}}` nesting, `\${literal}` escape, built-in resolvers (`oc.env`, `oc.select`, `oc.decode`, `oc.deprecated`, `oc.create`, `oc.dict.keys`, `oc.dict.values`); `defaults: [...]` composition; strict-merge against didactic schemas; YAML and TOML
- **jsPsych 8.x**: Material Design UI with JATOS deployment

## CLI

```bash
bead init my-experiment # Create project structure
bead templates fill # Fill templates
bead items construct # Construct items
bead lists partition # Create experiment lists
bead deploy # Generate jsPsych experiment
bead training run # Train with active learning
bead init my-experiment # Create project structure
bead templates fill # Fill templates
bead items construct # Construct items
bead lists partition # Create experiment lists
bead deploy # Generate jsPsych experiment
bead training run # Train with active learning
bead protocol validate # Validate the protocol section of a config
bead protocol realize # Materialize realizations for contexts
bead protocol items # Bridge a protocol to item templates
```

Every command accepts repeatable `--set KEY=VALUE` overrides applied
through the config composer, so any field of `BeadConfig` (including
nested `paths.data_dir`, `protocol.drift.min_length`, etc.) can be
overridden from the shell without editing the YAML.

## Documentation

Full documentation: [bead.readthedocs.io](https://bead.readthedocs.io)
Full documentation: [bead.readthedocs.io](https://factslab.io/bead/)

- [Installation Guide](https://bead.readthedocs.io/installation/)
- [User Guide](https://bead.readthedocs.io/user-guide/)
- [API Reference](https://bead.readthedocs.io/api/)
- [Gallery Examples](https://bead.readthedocs.io/examples/)
- [Installation Guide](https://factslab.io/bead/installation/)
- [User Guide](https://factslab.io/bead/user-guide/)
- [API Reference](https://factslab.io/bead/api/)
- [Gallery Examples](https://factslab.io/bead/examples/)

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion bead/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@

from __future__ import annotations

__version__ = "0.4.0"
__version__ = "0.5.0"
__author__ = "Aaron Steven White"
__email__ = "aaron.white@rochester.edu"
11 changes: 9 additions & 2 deletions bead/active_learning/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -499,13 +499,20 @@ def _save_model_components(self, save_path: Path) -> None:
pass

@abstractmethod
def _load_model_components(self, load_path: Path) -> None:
def _load_model_components(
self, load_path: Path, config_dict: dict[str, object]
) -> None:
"""Load model-specific components.

Parameters
----------
load_path : Path
Directory to load from.
config_dict : dict[str, object]
Schema-only config dict (model-specific state fields have
already been popped by :meth:`_restore_training_state`).
Subclasses use this to reconstruct ``self.config`` without
re-reading ``config.json`` from disk.
"""
pass

Expand Down Expand Up @@ -814,7 +821,7 @@ def load(self, path: str) -> None:

# Load model-specific components (which will reconstruct the config)
# This must happen before initializing random effects so config is correct
self._load_model_components(load_path)
self._load_model_components(load_path, config_dict)

# Initialize and load random effects
n_classes = self._get_n_classes_for_random_effects()
Expand Down
12 changes: 5 additions & 7 deletions bead/active_learning/models/binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -852,20 +852,18 @@ def _restore_training_state(self, config_dict: dict[str, object]) -> None:
self.label_names = config_dict.pop("label_names")
self.positive_class = config_dict.pop("positive_class")

def _load_model_components(self, load_path: Path) -> None:
def _load_model_components(
self, load_path: Path, config_dict: dict[str, object]
) -> None:
"""Load model-specific components.

Parameters
----------
load_path : Path
Directory to load from.
config_dict : dict[str, object]
Schema-only config dict.
"""
# Load config.json to reconstruct config
with open(load_path / "config.json") as f:
import json # noqa: PLC0415

config_dict = json.load(f)

# Reconstruct MixedEffectsConfig if needed
if "mixed_effects" in config_dict and isinstance(
config_dict["mixed_effects"], dict
Expand Down
12 changes: 5 additions & 7 deletions bead/active_learning/models/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -885,20 +885,18 @@ def _restore_training_state(self, config_dict: dict[str, object]) -> None:
self.num_classes = config_dict.pop("num_classes")
self.category_names = config_dict.pop("category_names")

def _load_model_components(self, load_path: Path) -> None:
def _load_model_components(
self, load_path: Path, config_dict: dict[str, object]
) -> None:
"""Load model-specific components.

Parameters
----------
load_path : Path
Directory to load from.
config_dict : dict[str, object]
Schema-only config dict.
"""
# Load config.json to reconstruct config
with open(load_path / "config.json") as f:
import json # noqa: PLC0415

config_dict = json.load(f)

# Reconstruct MixedEffectsConfig if needed
if "mixed_effects" in config_dict and isinstance(
config_dict["mixed_effects"], dict
Expand Down
12 changes: 5 additions & 7 deletions bead/active_learning/models/cloze.py
Original file line number Diff line number Diff line change
Expand Up @@ -769,20 +769,18 @@ def _get_save_state(self) -> dict[str, object]:
"""
return {}

def _load_model_components(self, load_path: Path) -> None:
def _load_model_components(
self, load_path: Path, config_dict: dict[str, object]
) -> None:
"""Load model-specific components.

Parameters
----------
load_path : Path
Directory to load from.
config_dict : dict[str, object]
Schema-only config dict.
"""
# Load config.json to reconstruct config
with open(load_path / "config.json") as f:
import json # noqa: PLC0415

config_dict = json.load(f)

# Reconstruct MixedEffectsConfig if needed
if "mixed_effects" in config_dict and isinstance(
config_dict["mixed_effects"], dict
Expand Down
11 changes: 5 additions & 6 deletions bead/active_learning/models/forced_choice.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

from __future__ import annotations

import json
import tempfile
from pathlib import Path

Expand Down Expand Up @@ -900,18 +899,18 @@ def _restore_training_state(self, config_dict: dict[str, object]) -> None:
self.num_classes = config_dict.pop("num_classes")
self.option_names = config_dict.pop("option_names")

def _load_model_components(self, load_path: Path) -> None:
def _load_model_components(
self, load_path: Path, config_dict: dict[str, object]
) -> None:
"""Load model-specific components.

Parameters
----------
load_path : Path
Directory to load from.
config_dict : dict[str, object]
Schema-only config dict.
"""
# Load config.json to reconstruct config
with open(load_path / "config.json") as f:
config_dict = json.load(f)

# Reconstruct MixedEffectsConfig if needed
if "mixed_effects" in config_dict and isinstance(
config_dict["mixed_effects"], dict
Expand Down
Loading
Loading