Skip to content

Commit 3c28435

Browse files
authored
added
1 parent f665f73 commit 3c28435

1 file changed

Lines changed: 31 additions & 27 deletions

File tree

0_Azure/3_AzureAI/9_AzureOpenAI/demos/4_PTUs_TPM.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,27 @@ Last updated: 2025-07-17
99

1010
----------
1111

12-
> Provisioned Throughput Units (PTUs) <br/>
13-
> Tokens Per Minute (TPM)
14-
15-
| **PTUs** | **Calls per Minute** | **Tokens in Prompt** | **Tokens in Response** | **Tokens per Minute (TPM)** |
16-
|----------|----------------------|----------------------|------------------------|-----------------------------|
17-
| 1 | 10 | 50 | 100 | 1,500 |
18-
| 2 | 20 | 50 | 100 | 3,000 |
19-
| 5 | 50 | 50 | 100 | 7,500 |
20-
| 10 | 100 | 50 | 100 | 15,000 |
21-
| 20 | 200 | 50 | 100 | 30,000 |
22-
| 50 | 500 | 50 | 100 | 75,000 |
12+
`Provisioned Throughput Units (PTUs)` <br/>
13+
`Tokens Per Minute (TPM)`
14+
15+
<details>
16+
<summary><b>List of References </b> (Click to expand)</summary>
17+
18+
- [How much throughput per PTU you get for each model](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
19+
- [Understanding costs associated with provisioned throughput units (PTU)](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding#azure-reservations-for-azure-ai-foundry-provisioned-throughput)
20+
- [Deployment types for Azure AI Foundry Models](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/deployment-types#global-provisioned)
21+
- [Region availability for provisioned throughput capability](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/provisioned-throughput?tabs=global-ptum#region-availability-for-provisioned-throughput-capability)
22+
- [Model summary table and region availability](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/models?tabs=global-ptum%2Cstandard-chat-completions#model-summary-table-and-region-availability)
23+
- [Fine-tuning models](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/models?tabs=global-ptum%2Cstandard-chat-completions#fine-tuning-models) - input/output Max
24+
- [Azure OpenAI in Azure AI Foundry Models quotas and limits](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?context=%2Fazure%2Fai-foundry%2Fcontext%2Fcontext&tabs=REST)
25+
26+
</details>
27+
28+
<div align="center">
29+
<img width="700" alt="image" src="https://github.com/user-attachments/assets/0741d4b2-d70e-4b5e-a6cf-9c399483e598" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
30+
</div>
31+
32+
From [Microsoft official documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding#model-independent-quota)
2333

2434
> Explanation:
2535
@@ -29,35 +39,29 @@ Last updated: 2025-07-17
2939
- **Tokens in Response**: The number of tokens in the model's response for each call.
3040
- **Tokens per Minute (TPM)**: The total number of tokens processed per minute, calculated as:
3141

42+
1. **Tokens per Minute**: Calculate the total tokens per minute:
43+
3244
$$
3345
\text{TPM} = \text{Calls per Minute} \times (\text{Tokens in Prompt} + \text{Tokens in Response})
3446
$$
3547

36-
> Example Calculation:
37-
For 50 PTUs:
38-
39-
1. **Calls per Minute**: Calculate the number of calls per minute:
40-
41-
$$
42-
\text{Calls per Minute} = \text{PTUs} \times \text{Calls per PTU per Minute}
43-
$$
48+
2. **Provisioned Throughput Units**:
4449

4550
$$
46-
\text{Calls per Minute} = 50 \times 10 = 500
51+
\text{PTUs} = \frac{\text{TPM}}{\text{Tokens per PTU per Minute}}
4752
$$
4853

49-
2. **Tokens per Minute**: Calculate the total tokens per minute:
54+
Where:
55+
- **TPM** = Total tokens you want to process per minute
56+
- **Tokens per PTU per Minute** = Depends on the model (e.g., 3,000 tokens/min for GPT-4.1 or GPT-4.1 Mini)
5057

51-
$$
52-
\text{TPM} = \text{Calls per Minute} \times (\text{Tokens in Prompt} + \text{Tokens in Response})
53-
$$
58+
> E.g
59+
> If you want to process **150,000 tokens per minute** using GPT-4.1:
5460
5561
$$
56-
\text{TPM} = 500 \times (50 + 100) = 500 \times 150 = 75,000
62+
\text{PTUs} = \frac{150{,}000}{3{,}000} = 50
5763
$$
5864

59-
This means with 50 PTUs, you can process 75,000 tokens per minute.
60-
6165
## Provisioned Capacity Calculator
6266

6367
> Improve accuracy of your estimate by adding multiple workloads to your PTU calculation. Each workload will be calculated and displayed as well as the aggregate total if both are running at the same time to your deployment.

0 commit comments

Comments
 (0)