Algorithms for abstention, calibration and domain adaptation to label shift.
-
Updated
Nov 14, 2020 - Python
Algorithms for abstention, calibration and domain adaptation to label shift.
A decision-safety lab for loan approval: trains a baseline classifier, calibrates probabilities (ECE/Brier), sweeps confidence thresholds to build a coverage, quality frontier and outputs a defensible abstention policy (auto-decide vs review). Includes a Streamlit dashboard for report cards, triage UI, and data quality checks.
A practical framework for turning data analysis into decision policies you can defend. Covers risk modeling, thresholding, exception handling, policy cards, monitoring, and update triggers, using real patterns like abstention rules, reorder points, and fairness-aware benchmarking. Built for “ship it” data science.
Longform article reframing abstention (reject option / selective prediction) as product design, not model weakness. Covers coverage as a KPI, calibration as a prerequisite, threshold selection under review capacity and risk, queue/UX design for human-in-the-loop workflows, and anti-patterns that break safety in production.
Decision-safe evaluation + Streamlit dashboard for AI vs Human vs Post-Edited AI text detection. Generates a reliability report card (Accuracy, Macro F1, ECE, Brier), calibration plots, confidence histograms, and a coverage-vs-performance abstention curve. Recommends an operating threshold for human-review routing.
Behavioral Trust Clustering a thermodynamic governance layer that reduces LLM hallucination by 52% on HumanEval. Drop-in wrapper for any decoder. MIT.
We show that a model owner can artificially introduce uncertainty into their model and provide a corresponding detection mechanism.
Reliable medical QA with Mistral-7B, QLoRA, selective prediction, and learned abstention via warm-start SFT + DPO.
Transform enrichment outputs into verifiable pathway claims via stability distillation, evidence modules, and mechanical PASS/ABSTAIN/FAIL audits.
Data visualization site on abstention from legislative elections in France
[MIDL 2026] Official PyTorch implementation of 'Generalizing Abstention for Noise-Robust Learning in Medical Image Segmentation'
Discovery-engine catalog for the LLM Dark Patterns Hooks suite. Maps impossible-task classes to dishonest defaults to existing/candidate Stop hooks.
Code and data release for the paper 'Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence'
Missingness-aware abstention for selective classification under MCAR/MAR/MNAR label missingness.
Reference implementation — constraintive governance substrate for interpretive governance (agentic-closed)
Conformal-calibrated, URDF physics-gated validity & abstention auditing for LeRobot robot-learning datasets (GPU-free, Apache-2.0)
Safety-facing carve-out: PBS-stratified abstention/calibration on 5 biology domains (CT/ADMET/SC-Perturbation/ClinVar/GWAS) × 2 providers × 2 prompt conditions
A counterfactual benchmark for testing whether language models know when to answer, ask, verify, or abstain.
Code for our paper analyzing the looseness of the upper bound on selective classification performance.
Derrida-inspired engineering invariants for retrieval-augmented LMs: iterability, supplement (source-stripped retrieval), trace, and aporia gates (conformal abstention). Mechanically verified, not metaphorical.
Add a description, image, and links to the abstention topic page so that developers can more easily learn about it.
To associate your repository with the abstention topic, visit your repo's landing page and select "manage topics."