You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<summary><b>List of References </b> (Click to expand)</summary>
17
+
18
+
-[How much throughput per PTU you get for each model](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
19
+
-[Understanding costs associated with provisioned throughput units (PTU)](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding#azure-reservations-for-azure-ai-foundry-provisioned-throughput)
20
+
-[Deployment types for Azure AI Foundry Models](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/deployment-types#global-provisioned)
21
+
-[Region availability for provisioned throughput capability](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/provisioned-throughput?tabs=global-ptum#region-availability-for-provisioned-throughput-capability)
22
+
-[Model summary table and region availability](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/models?tabs=global-ptum%2Cstandard-chat-completions#model-summary-table-and-region-availability)
23
+
-[Fine-tuning models](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/models?tabs=global-ptum%2Cstandard-chat-completions#fine-tuning-models) - input/output Max
24
+
-[Azure OpenAI in Azure AI Foundry Models quotas and limits](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?context=%2Fazure%2Fai-foundry%2Fcontext%2Fcontext&tabs=REST)
From [Microsoft official documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding#model-independent-quota)
23
33
24
34
> Explanation:
25
35
@@ -29,35 +39,29 @@ Last updated: 2025-07-17
29
39
-**Tokens in Response**: The number of tokens in the model's response for each call.
30
40
-**Tokens per Minute (TPM)**: The total number of tokens processed per minute, calculated as:
31
41
42
+
1.**Tokens per Minute**: Calculate the total tokens per minute:
43
+
32
44
$$
33
45
\text{TPM} = \text{Calls per Minute} \times (\text{Tokens in Prompt} + \text{Tokens in Response})
34
46
$$
35
47
36
-
> Example Calculation:
37
-
For 50 PTUs:
38
-
39
-
1.**Calls per Minute**: Calculate the number of calls per minute:
40
-
41
-
$$
42
-
\text{Calls per Minute} = \text{PTUs} \times \text{Calls per PTU per Minute}
43
-
$$
48
+
2.**Provisioned Throughput Units**:
44
49
45
50
$$
46
-
\text{Calls per Minute} = 50 \times 10 = 500
51
+
\text{PTUs} = \frac{\text{TPM}}{\text{Tokens per PTU per Minute}}
47
52
$$
48
53
49
-
2.**Tokens per Minute**: Calculate the total tokens per minute:
54
+
Where:
55
+
-**TPM** = Total tokens you want to process per minute
56
+
-**Tokens per PTU per Minute** = Depends on the model (e.g., 3,000 tokens/min for GPT-4.1 or GPT-4.1 Mini)
50
57
51
-
$$
52
-
\text{TPM} = \text{Calls per Minute} \times (\text{Tokens in Prompt} + \text{Tokens in Response})
53
-
$$
58
+
> E.g
59
+
> If you want to process **150,000 tokens per minute** using GPT-4.1:
This means with 50 PTUs, you can process 75,000 tokens per minute.
60
-
61
65
## Provisioned Capacity Calculator
62
66
63
67
> Improve accuracy of your estimate by adding multiple workloads to your PTU calculation. Each workload will be calculated and displayed as well as the aggregate total if both are running at the same time to your deployment.
0 commit comments