Skip to content

feat: activate Focal Loss, add Dice Loss + span-width weighting, OpenVINO INT8 pipeline#361

Open
ALI-AL-MARJANI wants to merge 2 commits into
urchade:mainfrom
ALI-AL-MARJANI:feat/focal-dice-loss-openvino
Open

feat: activate Focal Loss, add Dice Loss + span-width weighting, OpenVINO INT8 pipeline#361
ALI-AL-MARJANI wants to merge 2 commits into
urchade:mainfrom
ALI-AL-MARJANI:feat/focal-dice-loss-openvino

Conversation

@ALI-AL-MARJANI
Copy link
Copy Markdown

Summary

1. Loss functions (gliner/modeling/loss_functions.py, base.py, trainer.py)

focal_loss_with_logits already exists in the codebase but is disabled by default
(alpha=-1, gamma=0). This PR:

  • Exposes loss_type, focal_loss_alpha, focal_loss_gamma in TrainingArguments so
    users can activate it
  • Adds span_dice_loss() — span-level Dice Loss adapted from Li et al. (ACL 2020),
    applied element-wise over the (B, L×K, T) logit tensor with ignore_index masking
  • Adds use_span_width_weight flag: positive spans of width k receive w(k) = 1 + log(k+1) — zero inference overhead

Motivation: WNUT-17 has 187× more negative spans than positive entities (0.53%
positive ratio). BCE's gradient is dominated by trivial negatives. Focal α=0.25 delivers
+0.99 pp WNUT-17 F1; Dice delivers +0.70 pp.

2. Bug fixes (gliner/utils.py, gliner/modeling/encoder.py, gliner/model.py,

gliner/onnx/model.py)

  • is_module_available() changed from __import__() to importlib.util.find_spec()
    prevents optional packages (peft, tensorflow) from being eagerly imported, which caused
    OpenMP deadlocks on macOS ARM
  • encoder.py: kwargs.pop("token_lengths", None) prevents crash on bi-encoder models
    that pass this GLiNER-internal kwarg to HuggingFace forward methods
  • Lazy import of Trainer/TrainingArguments in model.py — avoids importing
    torch.distributed at module load time

3. OpenVINO INT8 pipeline (scripts/convert_to_openvino.py)

New script: ONNX → OpenVINO IR → INT8 weight compression via
nncf.compress_weights(INT8_ASYM).

  • 2.35× CPU speedup, 4× model size reduction, no accuracy degradation
  • Uses weight-only compression (not activation quantization) because GLiNER's ONNX graph
    contains If nodes with dynamic rank that the CPU plugin rejects during calibration-based
    quantization

Benchmark

Config WNUT-17 F1 Δ
BCE baseline 50.09%
Focal α=0.25, γ=2 51.08% +0.99 pp
Dice Loss 50.79% +0.70 pp
Backend Latency Speedup Size
PyTorch FP32 59ms 721MB
OpenVINO INT8 25ms 2.35× 181MB

Model: knowledgator/gliner-bi-small-v1.0, 200 fine-tuning steps on CoNLL-2003, eval on
WNUT-17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant