Skip to content

Commit 72cc5e3

Browse files
Jun-Howieqinxuye
andauthored
FEAT: support deepseek-r1-0528-qwen3 (#3552)
Co-authored-by: qinxuye <qinxuye@gmail.com>
1 parent 1368c10 commit 72cc5e3

11 files changed

Lines changed: 180 additions & 52 deletions

File tree

doc/source/getting_started/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Currently, supported models include:
6060
- ``codestral-v0.1``
6161
- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
6262
- ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
63-
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama``
63+
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-0528-qwen3``, ``deepseek-r1-distill-llama``
6464
- ``yi-coder``, ``yi-coder-chat``
6565
- ``codeqwen1.5``, ``codeqwen1.5-chat``
6666
- ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m``
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
.. _models_llm_deepseek-r1-0528-qwen3:
2+
3+
========================================
4+
deepseek-r1-0528-qwen3
5+
========================================
6+
7+
- **Context Length:** 131072
8+
- **Model Name:** deepseek-r1-0528-qwen3
9+
- **Languages:** en, zh
10+
- **Abilities:** chat, reasoning
11+
- **Description:** The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro
12+
13+
Specifications
14+
^^^^^^^^^^^^^^
15+
16+
17+
Model Spec 1 (pytorch, 8 Billion)
18+
++++++++++++++++++++++++++++++++++++++++
19+
20+
- **Model Format:** pytorch
21+
- **Model Size (in billions):** 8
22+
- **Quantizations:** none
23+
- **Engines**: vLLM, Transformers, SGLang
24+
- **Model ID:** deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
25+
- **Model Hubs**: `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B>`__
26+
27+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
28+
chosen quantization method from the options listed above::
29+
30+
xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format pytorch --quantization ${quantization}
31+
32+
33+
Model Spec 2 (gptq, 8 Billion)
34+
++++++++++++++++++++++++++++++++++++++++
35+
36+
- **Model Format:** gptq
37+
- **Model Size (in billions):** 8
38+
- **Quantizations:** Int4-W4A16, Int8-W8A16
39+
- **Engines**: vLLM, Transformers, SGLang
40+
- **Model ID:** QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization}
41+
- **Model Hubs**: `Hugging Face <https://huggingface.co/QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization}>`__, `ModelScope <https://modelscope.cn/models/tclf90/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix>`__
42+
43+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
44+
chosen quantization method from the options listed above::
45+
46+
xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format gptq --quantization ${quantization}
47+
48+
49+
Model Spec 3 (gptq, 8 Billion)
50+
++++++++++++++++++++++++++++++++++++++++
51+
52+
- **Model Format:** gptq
53+
- **Model Size (in billions):** 8
54+
- **Quantizations:** Int4-Int8Mix
55+
- **Engines**: vLLM, Transformers, SGLang
56+
- **Model ID:** QuantTrio/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix
57+
- **Model Hubs**: `Hugging Face <https://huggingface.co/QuantTrio/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix>`__, `ModelScope <https://modelscope.cn/models/tclf90/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix>`__
58+
59+
Execute the following command to launch the model, remember to replace ``${quantization}`` with your
60+
chosen quantization method from the options listed above::
61+
62+
xinference launch --model-engine ${engine} --model-name deepseek-r1-0528-qwen3 --size-in-billions 8 --model-format gptq --quantization ${quantization}
63+

doc/source/models/builtin/llm/deepseek-r1-0528.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Model Spec 1 (pytorch, 671 Billion)
2020
- **Model Format:** pytorch
2121
- **Model Size (in billions):** 671
2222
- **Quantizations:** none
23-
- **Engines**: vLLM, Transformers
23+
- **Engines**: vLLM, Transformers, SGLang
2424
- **Model ID:** deepseek-ai/DeepSeek-R1-0528
2525
- **Model Hubs**: `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-R1-0528>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-0528>`__
2626

doc/source/models/builtin/llm/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,11 @@ The following is a list of built-in LLM in Xinference:
111111
- 163840
112112
- DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
113113

114+
* - :ref:`deepseek-r1-0528-qwen3 <models_llm_deepseek-r1-0528-qwen3>`
115+
- chat, reasoning
116+
- 131072
117+
- The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro
118+
114119
* - :ref:`deepseek-r1-distill-llama <models_llm_deepseek-r1-distill-llama>`
115120
- chat, reasoning
116121
- 131072
@@ -634,6 +639,8 @@ The following is a list of built-in LLM in Xinference:
634639

635640
deepseek-r1-0528
636641

642+
deepseek-r1-0528-qwen3
643+
637644
deepseek-r1-distill-llama
638645

639646
deepseek-r1-distill-qwen

doc/source/models/builtin/llm/qwen-vl-chat.rst

Lines changed: 0 additions & 47 deletions
This file was deleted.

doc/source/user_guide/backends.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ Currently, supported model includes:
123123
- ``codestral-v0.1``
124124
- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
125125
- ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
126-
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-distill-llama``
126+
- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``, ``deepseek-v3``, ``deepseek-v3-0324``, ``deepseek-r1``, ``deepseek-r1-0528``, ``deepseek-prover-v2``, ``deepseek-r1-0528-qwen3``, ``deepseek-r1-distill-llama``
127127
- ``yi-coder``, ``yi-coder-chat``
128128
- ``codeqwen1.5``, ``codeqwen1.5-chat``
129129
- ``qwen2.5``, ``qwen2.5-coder``, ``qwen2.5-instruct``, ``qwen2.5-coder-instruct``, ``qwen2.5-instruct-1m``

xinference/model/llm/llm_family.json

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6749,6 +6749,56 @@
67496749
"reasoning_start_tag": "<think>",
67506750
"reasoning_end_tag": "</think>"
67516751
},
6752+
{
6753+
"version": 1,
6754+
"context_length": 131072,
6755+
"model_name": "deepseek-r1-0528-qwen3",
6756+
"model_lang": [
6757+
"en",
6758+
"zh"
6759+
],
6760+
"model_ability": [
6761+
"chat",
6762+
"reasoning"
6763+
],
6764+
"model_description": "The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro",
6765+
"model_specs": [
6766+
{
6767+
"model_format": "pytorch",
6768+
"model_size_in_billions": 8,
6769+
"quantizations": [
6770+
"none"
6771+
],
6772+
"model_id": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B"
6773+
},
6774+
{
6775+
"model_format": "gptq",
6776+
"model_size_in_billions": 8,
6777+
"quantizations": [
6778+
"Int4-W4A16",
6779+
"Int8-W8A16"
6780+
],
6781+
"model_id": "QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization}"
6782+
},
6783+
{
6784+
"model_format": "gptq",
6785+
"model_size_in_billions": 8,
6786+
"quantizations": [
6787+
"Int4-Int8Mix"
6788+
],
6789+
"model_id": "QuantTrio/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix"
6790+
}
6791+
],
6792+
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{% set content = message['content'] %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + content + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{% endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if content is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{content + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + content + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}",
6793+
"stop_token_ids": [
6794+
151645
6795+
],
6796+
"stop": [
6797+
"<|end▁of▁sentence|>"
6798+
],
6799+
"reasoning_start_tag": "<think>",
6800+
"reasoning_end_tag": "</think>"
6801+
},
67526802
{
67536803
"version": 1,
67546804
"context_length": 163840,

xinference/model/llm/llm_family_modelscope.json

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4873,7 +4873,7 @@
48734873
"chat",
48744874
"reasoning"
48754875
],
4876-
"model_description": "DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.",
4876+
"model_description": "The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro",
48774877
"model_specs": [
48784878
{
48794879
"model_format": "pytorch",
@@ -4895,6 +4895,59 @@
48954895
"reasoning_start_tag": "<think>",
48964896
"reasoning_end_tag": "</think>"
48974897
},
4898+
{
4899+
"version": 1,
4900+
"context_length": 131072,
4901+
"model_name": "deepseek-r1-0528-qwen3",
4902+
"model_lang": [
4903+
"en",
4904+
"zh"
4905+
],
4906+
"model_ability": [
4907+
"chat",
4908+
"reasoning"
4909+
],
4910+
"model_description": "The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro",
4911+
"model_specs": [
4912+
{
4913+
"model_format": "pytorch",
4914+
"model_size_in_billions": 8,
4915+
"quantizations": [
4916+
"none"
4917+
],
4918+
"model_id": "deepseek-ai/DeepSeek-R1-0528-Qwen3-8B",
4919+
"model_hub": "modelscope"
4920+
},
4921+
{
4922+
"model_format": "gptq",
4923+
"model_size_in_billions": 8,
4924+
"quantizations": [
4925+
"Int4-W4A16",
4926+
"Int8-W8A16"
4927+
],
4928+
"model_id": "okwinds/DeepSeek-R1-0528-Qwen3-8B-{quantization}",
4929+
"model_hub": "modelscope"
4930+
},
4931+
{
4932+
"model_format": "gptq",
4933+
"model_size_in_billions": 8,
4934+
"quantizations": [
4935+
"Int4-Int8Mix"
4936+
],
4937+
"model_id": "tclf90/DeepSeek-R1-0528-Qwen3-8B-GPTQ-Int4-Int8Mix",
4938+
"model_hub": "modelscope"
4939+
}
4940+
],
4941+
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{% set content = message['content'] %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + content + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{% endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if content is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{content + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + content + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}",
4942+
"stop_token_ids": [
4943+
151645
4944+
],
4945+
"stop": [
4946+
"<|end▁of▁sentence|>"
4947+
],
4948+
"reasoning_start_tag": "<think>",
4949+
"reasoning_end_tag": "</think>"
4950+
},
48984951
{
48994952
"version": 1,
49004953
"context_length": 163840,

0 commit comments

Comments
 (0)