Skip to content

Error in loading quantized models without quantize_config.json #144

@srikumar003

Description

@srikumar003

Issue:
Trying to load a quantized model (e.g. https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16) through fms-hf-tuning which fails with the following error:

�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m Traceback (most recent call last):
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/scripts/wrapper_sfttrainer.py", line 467, in main
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     module.parse_arguments_and_execute_wrapper(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/wrapper_fms_hf_tuning/tuning_versions/at_least_2_5_0.py", line 51, in parse_arguments_and_execute_wrapper
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     return tuning.sft_trainer.train(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/tuning/sft_trainer.py", line 278, in train
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     model = model_loader(
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration/framework.py", line 183, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     return plugin.model_loader(model_name, **kwargs)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 121, in model_loader
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     quantize_config = QuantizeConfig.from_pretrained(model_name)
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m   File "/tmp/ray/session_2025-05-09_01-23-52_411684_1/runtime_resources/pip/168bc13c04f83d68d2f5fa2953228fbf20584a8c/virtualenv/lib/python3.10/site-packages/fms_acceleration_peft/gptqmodel/quantization/config.py", line 297, in from_pretrained
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m     with open(resolved_config_file, "r", encoding="utf-8") as f:
�[36m(launch_finetune pid=20238, ip=10.48.25.68)�[0m FileNotFoundError: [Errno 2] No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json'

No such file or directory: '/hf-models-pvc/granite-3.1-2b-instruct-int4-gptq/quantize_config.json

Expected behaviour:
The model will load successfully

Additional information

To quote from an internal Slack message, these models are produced through llm-compressor and have a quantization_config in config.json example instead of quantize_config.json.

While the new configuration is supported in fms-acceleration, it is never executed as the from_pretrained method in fms_acceleration_peft/gptqmodel/quantization/config.py. The reason why this happens is in this for loop where the loop is supposed to iterate over all the supported config files but exits after the first file (quantize_config.json) is converted to a path.

A simple check of the existence of quantize_config.json would resolve this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions