bug: page-priority selection in extract_key_sections is silently ignored

## Description

In `src/llm/llm_text.py` (lines 290–305), a page-priority selection algorithm classifies pages, sorts them by priority, computes a ranked `selected` list, and tracks a `budget`  then immediately discards all of it by pivoting to a completely different section-based algorithm on line 306. None of the computed results are ever used.

## Problematic Code

```python
pages = split_into_pages(text)          # ← dead
scored: List[Tuple[int, int, str]] = []
for page_num, page_text in pages:
    useful, priority = classify_page(page_text)  # ← dead
    if useful:
        scored.append((priority, page_num, page_text))

scored.sort(key=lambda t: t[0])         # ← dead

selected: List[Tuple[int, str]] = []
budget = max_chars
for _priority, page_num, page_text in scored:
    page_with_marker = f"[PAGE {page_num}]\n{page_text}"
    if len(page_with_marker) <= budget:
        selected.append((page_num, page_with_marker))
        budget -= len(page_with_marker)
lines = text.split("\n")   # ← pivots to full original text; everything above is discarded
```

`budget` is then re-initialized at line 345:
```python
budget = max_chars - len(preamble_text)
```

## What Goes Wrong

- `split_into_pages`, `classify_page`, `scored.sort`, `selected`, and the first `budget` are all computed but never referenced after line 305.
- The page-priority ranking has no effect on the final output.
- The LLM receives sections chosen by paragraph-level keyword scoring alone, which is the correct behavior for this project — but the dead block above runs on every call for nothing.
- No error is raised, the pipeline runs normally but wastes computation on the dead block.

## Fix

Remove lines 290–305 entirely. The section-based keyword paragraph approach (Phase 1 + Phase 2 below line 306) is the correct algorithm for this project: it scores individual paragraphs by extraction-relevant keywords, guaranteeing that data like stomach counts and sample sizes are included regardless of which page or section they appear in. The page-priority approach is coarser (whole-page granularity) and would waste the character budget on surrounding irrelevant content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: page-priority selection in extract_key_sections is silently ignored #63

Description

Problematic Code

What Goes Wrong

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: page-priority selection in extract_key_sections is silently ignored #63

Description

Description

Problematic Code

What Goes Wrong

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions