Skip to content

Commit a3819d6

Browse files
authored
recipes_source/quantization.rst λ²ˆμ—­ (#589)
recipes_source/quantization.rst λ²ˆμ—­ (#589)
1 parent 5edf398 commit a3819d6

1 file changed

Lines changed: 44 additions & 44 deletions

File tree

Lines changed: 44 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,42 @@
1-
Quantization Recipe
2-
=====================================
1+
μ–‘μžν™” λ ˆμ‹œν”Ό
2+
============
33

4-
This recipe demonstrates how to quantize a PyTorch model so it can run with reduced size and faster inference speed with about the same accuracy as the original model. Quantization can be applied to both server and mobile model deployment, but it can be especially important or even critical on mobile, because a non-quantized model's size may exceed the limit that an iOS or Android app allows for, cause the deployment or OTA update to take too much time, and make the inference too slow for a good user experience.
4+
이 λ ˆμ‹œν”ΌλŠ” Pytorch λͺ¨λΈμ„ μ–‘μžν™”ν•˜λŠ” 방법을 μ„€λͺ…ν•©λ‹ˆλ‹€. μ–‘μžν™”λœ λͺ¨λΈμ€ 원본 λͺ¨λΈκ³Ό 거의 같은 정확도λ₯Ό λ‚΄λ©΄μ„œ, μ‚¬μ΄μ¦ˆκ°€ 쀄어듀고 μΆ”λ‘  속도가 λΉ¨λΌμ§‘λ‹ˆλ‹€. μ–‘μžν™” μž‘μ—…μ€ μ„œλ²„ λͺ¨λΈκ³Ό λͺ¨λ°”일 λͺ¨λΈ 배포에 λͺ¨λ‘ 적용될 수 μžˆμ§€λ§Œ, λͺ¨λ°”일 ν™˜κ²½μ—μ„œ 특히 μ€‘μš”ν•˜κ³  맀우 ν•„μš”ν•©λ‹ˆλ‹€. κ·Έ μ΄μœ λŠ” μ–‘μžν™”λ₯Ό μ μš©ν•˜μ§€ μ•Šμ€ λͺ¨λΈμ˜ 크기가 iOSλ‚˜ Android 앱이 ν—ˆμš©ν•˜λŠ” 크기 ν•œλ„λ₯Ό μ΄ˆκ³Όν•˜κ³ , 그둜 인해 λͺ¨λΈμ˜ λ°°ν¬λ‚˜ OTA μ—…λ°μ΄νŠΈκ°€ λ„ˆλ¬΄ 였래 걸리며, λ˜ν•œ μΆ”λ‘  속도가 λ„ˆλ¬΄ λŠλ €μ„œ μ‚¬μš©μžμ˜ μΎŒμ ν•¨μ„ λ°©ν•΄ν•˜κΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.
55

6-
Introduction
7-
------------
6+
μ†Œκ°œ
7+
----
88

9-
Quantization is a technique that converts 32-bit floating numbers in the model parameters to 8-bit integers. With quantization, the model size and memory footprint can be reduced to 1/4 of its original size, and the inference can be made about 2-4 times faster, while the accuracy stays about the same.
9+
μ–‘μžν™”λŠ” λͺ¨λΈ λ§€κ°œλ³€μˆ˜λ₯Ό κ΅¬μ„±ν•˜λŠ” 32λΉ„νŠΈ 크기의 μ‹€μˆ˜ μžλ£Œν˜•μ˜ 숫자λ₯Ό 8λΉ„νŠΈ 크기의 μ •μˆ˜ μžλ£Œν˜•μ˜ 숫자둜 μ „ν™˜ν•˜λŠ” κΈ°λ²•μž…λ‹ˆλ‹€. μ–‘μžν™” 기법을 μ μš©ν•˜λ©΄, μ •ν™•λ„λŠ” 거의 κ°™κ²Œ μœ μ§€ν•˜λ©΄μ„œ, λͺ¨λΈμ˜ 크기와 λ©”λͺ¨λ¦¬ 전체 μ‚¬μš©λŸ‰μ„ 원본 λͺ¨λΈμ˜ 4λΆ„μ˜ 1κΉŒμ§€ κ°μ†Œμ‹œν‚¬ 수 있고, 좔둠은 2~4λ°° 정도 λΉ λ₯΄κ²Œ λ§Œλ“€ 수 μžˆμŠ΅λ‹ˆλ‹€.
1010

11-
There are overall three approaches or workflows to quantize a model: post training dynamic quantization, post training static quantization, and quantization aware training. But if the model you want to use already has a quantized version, you can use it directly without going through any of the three workflows above. For example, the `torchvision` library already includes quantized versions for models MobileNet v2, ResNet 18, ResNet 50, Inception v3, GoogleNet, among others. So we will make the last approach another workflow, albeit a simple one.
11+
λͺ¨λΈμ„ μ–‘μžν™”ν•˜λŠ” λ°λŠ” μ „λΆ€ μ„Έ κ°€μ§€μ˜ 접근법 및 μž‘μ—…λ°©μ‹μ΄ μžˆμŠ΅λ‹ˆλ‹€. ν•™μŠ΅ ν›„ 동적 μ–‘μžν™”(post training dynamic quantization), ν•™μŠ΅ ν›„ 정적 μ–‘μžν™”(post training static quantization), 그리고 μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅(quantization aware training)이 μžˆμŠ΅λ‹ˆλ‹€. ν•˜μ§€λ§Œ μ‚¬μš©ν•˜λ €λŠ” λͺ¨λΈμ΄ 이미 μ–‘μžν™”λœ 버전이 μžˆλ‹€λ©΄, μœ„μ˜ μ„Έ κ°€μ§€ 방식을 κ±°μΉ˜μ§€ μ•Šκ³  κ·Έ 버전을 λ°”λ‘œ μ‚¬μš©ν•˜λ©΄ λ©λ‹ˆλ‹€. 예λ₯Ό λ“€μ–΄, `torchvision` λΌμ΄λΈŒλŸ¬λ¦¬μ—λŠ” 이미 MobileNet v2, ResNet 18, ResNet 50, Inception v3, GoogleNet을 ν¬ν•¨ν•œ λͺ¨λΈμ˜ μ–‘μžν™”λœ 버전이 μ‘΄μž¬ν•©λ‹ˆλ‹€. λ”°λΌμ„œ 비둝 λ‹¨μˆœν•œ μž‘μ—…μ΄κ² μ§€λ§Œ, 사전 ν•™μŠ΅ 및 μ–‘μžν™”λœ λͺ¨λΈ μ‚¬μš©(use pretrained quantized model)을 또 λ‹€λ₯Έ μž‘μ—… 방식 쀑 ν•˜λ‚˜λ‘œ ν¬ν•¨ν•˜λ € ν•©λ‹ˆλ‹€.
1212

1313
.. note::
14-
The quantization support is available for a limited set of operators. See `this <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#device-and-operator-support>`_ for more information.
14+
μ–‘μžν™”λŠ” 일뢀 μ œν•œλœ λ²”μœ„μ˜ μ—°μ‚°μžμ—λ§Œ μ§€μ›λ©λ‹ˆλ‹€. 더 λ§Žμ€ μ •λ³΄λŠ” `μ—¬κΈ° <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#device-and-operator-support>`_ λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
1515

16-
Pre-requisites
17-
-----------------
16+
μš”κ΅¬ 사항
17+
--------
1818

1919
PyTorch 1.6.0 or 1.7.0
2020

2121
torchvision 0.6.0 or 0.7.0
2222

23-
Workflows
24-
------------
23+
μž‘μ—… 흐름
24+
---------
2525

26-
Use one of the four workflows below to quantize a model.
26+
λͺ¨λΈμ„ μ–‘μžν™”ν•˜λ €λ©΄ λ‹€μŒ 4κ°€μ§€ 방식 쀑 ν•˜λ‚˜λ₯Ό μ‚¬μš©ν•˜μ„Έμš”.
2727

28-
1. Use Pretrained Quantized MobileNet v2
29-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28+
1. 사전 ν•™μŠ΅ 및 μ–‘μžν™”λœ MobileNet v2 μ‚¬μš©ν•˜κΈ° (Use Pretrained Quantized MobileNet v2)
29+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3030

31-
To get the MobileNet v2 quantized model, simply do:
31+
사전 ν•™μŠ΅λœ MobileNet v2 λͺ¨λΈμ„ 뢈러였렀면, λ‹€μŒμ„ μž…λ ₯ν•˜μ„Έμš”.
3232

3333
::
3434

3535
import torchvision
3636
model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
3737

3838

39-
To compare the size difference of a non-quantized MobileNet v2 model with its quantized version:
39+
μ–‘μžν™” μ „μ˜ MobileNet v2 λͺ¨λΈκ³Ό μ–‘μžν™”λœ λ²„μ „μ˜ λͺ¨λΈμ˜ 크기λ₯Ό λΉ„κ΅ν•©λ‹ˆλ‹€.
4040

4141
::
4242

@@ -54,36 +54,36 @@ To compare the size difference of a non-quantized MobileNet v2 model with its qu
5454
print_model_size(model_quantized)
5555

5656

57-
The outputs will be:
57+
좜λ ₯은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
5858

5959
::
6060

6161
14.27 MB
6262
3.63 MB
6363

64-
2. Post Training Dynamic Quantization
65-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64+
2. ν•™μŠ΅ ν›„ 동적 μ–‘μžν™” (Post Training Dynamic Quantization)
65+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6666

67-
To apply Dynamic Quantization, which converts all the weights in a model from 32-bit floating numbers to 8-bit integers but doesn't convert the activations to int8 till just before performing the computation on the activations, simply call `torch.quantization.quantize_dynamic`:
67+
동적 μ–‘μžν™”λ₯Ό μ μš©ν•˜λ©΄, λͺ¨λΈμ˜ λͺ¨λ“  κ°€μ€‘μΉ˜λŠ” 32λΉ„νŠΈ 크기의 μ‹€μˆ˜ μžλ£Œν˜•μ—μ„œ 8λΉ„νŠΈ 크기의 μ •μˆ˜ μžλ£Œν˜•μœΌλ‘œ μ „ν™˜λ˜μ§€λ§Œ, ν™œμ„±ν™”μ— λŒ€ν•œ 계산을 μ§„ν–‰ν•˜κΈ° μ§μ „κΉŒμ§€λŠ” ν™œμ„± ν•¨μˆ˜λŠ” 8λΉ„νŠΈ μ •μˆ˜ν˜•μœΌλ‘œ μ „ν™˜ν•˜μ§€ μ•Šκ²Œ λ©λ‹ˆλ‹€. 동적 μ–‘μžν™”λ₯Ό μ μš©ν•˜λ €λ©΄, `torch.quantization.quantize_dynamic` 을 μ‚¬μš©ν•˜λ©΄ λ©λ‹ˆλ‹€.
6868

6969
::
7070

7171
model_dynamic_quantized = torch.quantization.quantize_dynamic(
7272
model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8
7373
)
7474

75-
where `qconfig_spec` specifies the list of submodule names in `model` to apply quantization to.
75+
μ—¬κΈ°μ„œ `qconfig_spec` 으둜 `model` λ‚΄μ—μ„œ μ–‘μžν™” 적용 λŒ€μƒμΈ λ‚΄λΆ€ λͺ¨λ“ˆ(submodules)을 μ§€μ •ν•©λ‹ˆλ‹€.
7676

77-
.. warning:: An important limitation of Dynamic Quantization, while it is the easiest workflow if you do not have a pre-trained quantized model ready for use, is that it currently only supports `nn.Linear` and `nn.LSTM` in `qconfig_spec`, meaning that you will have to use Static Quantization or Quantization Aware Training, to be discussed later, to quantize other modules such as `nn.Conv2d`.
77+
.. warning:: 동적 μ–‘μžν™”λŠ” 사전 ν•™μŠ΅λœ μ–‘μžν™” 적용 λͺ¨λΈμ΄ μ€€λΉ„λ˜μ§€ μ•Šμ•˜μ„ λ•Œ μ‚¬μš©ν•˜κΈ° κ°€μž₯ μ‰¬μš΄ λ°©μ‹μ΄μ§€λ§Œ, 이 λ°©μ‹μ˜ μ£Όμš” ν•œκ³„λŠ” `qconfig_spec` μ˜΅μ…˜μ΄ ν˜„μž¬λŠ” `nn.Linear` κ³Ό `nn.LSTM` 만 μ§€μ›ν•œλ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€. μ΄λŠ” `nn.Conv2d` 같은 λ‹€λ₯Έ λͺ¨λ“ˆμ„ μ–‘μžν™”ν•  λ•Œ, λ‚˜μ€‘μ— λ…Όμ˜λ  정적 μ–‘μžν™”λ‚˜ μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ„ μ‚¬μš©ν•΄μ•Ό ν•œλ‹€λŠ” κ±Έ μ˜λ―Έν•©λ‹ˆλ‹€.
7878

79-
The full documentation of the `quantize_dynamic` API call is `here <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_. Three other examples of using the post training dynamic quantization are `the Bert example <https://tutorials.pytorch.kr/intermediate/dynamic_quantization_bert_tutorial.html>`_, `an LSTM model example <https://tutorials.pytorch.kr/advanced/dynamic_quantization_tutorial.html#test-dynamic-quantization>`_, and another `demo LSTM example <https://tutorials.pytorch.kr/recipes/recipes/dynamic_quantization.html#do-the-quantization>`_.
79+
`quantize_dynamic` API call κ΄€λ ¨ 전체 λ¬Έμ„œλŠ” `μ—¬κΈ° <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_ λ₯Ό μ°Έκ³ ν•˜μ„Έμš”. ν•™μŠ΅ ν›„ 동적 μ–‘μžν™”λ₯Ό μ‚¬μš©ν•˜λŠ” μ„Έ κ°€μ§€ μ˜ˆμ œμ—λŠ” `the Bert example <https://pytorch.org/tutorials/intermediate/dynamic_quantization_bert_tutorial.html>`_, `an LSTM model example <https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html#test-dynamic-quantization>`_, `demo LSTM example <https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html#do-the-quantization>`_ 이 μžˆμŠ΅λ‹ˆλ‹€.
8080

81-
3. Post Training Static Quantization
82-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
81+
3. ν•™μŠ΅ ν›„ 정적 μ–‘μžν™” (Post Training Static Quantization)
82+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8383

84-
This method converts both the weights and the activations to 8-bit integers beforehand so there won't be on-the-fly conversion on the activations during the inference, as the dynamic quantization does, hence improving the performance significantly.
84+
이 방식은 λͺ¨λΈμ˜ κ°€μ€‘μΉ˜μ™€ ν™œμ„± ν•¨μˆ˜ λͺ¨λ‘λ₯Ό 8λΉ„νŠΈ 크기의 μ •μˆ˜ μžλ£Œν˜•μœΌλ‘œ 사전에 λ°”κΎΈκΈ° λ•Œλ¬Έμ—, 동적 μ–‘μžν™”μ²˜λŸΌ μΆ”λ‘  κ³Όμ • 쀑에 ν™œμ„± ν•¨μˆ˜λ₯Ό μ „ν™˜ν•˜μ§€λŠ” μ•ŠμŠ΅λ‹ˆλ‹€. λ”°λΌμ„œ 이 방식은 μ„±λŠ₯이 λ›°μ–΄λ‚©λ‹ˆλ‹€.
8585

86-
To apply static quantization on a model, run the following code:
86+
정적 μ–‘μžν™”λ₯Ό λͺ¨λΈμ— μ μš©ν•˜λŠ” μ½”λ“œλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
8787

8888
::
8989

@@ -93,43 +93,43 @@ To apply static quantization on a model, run the following code:
9393
model_static_quantized = torch.quantization.prepare(model, inplace=False)
9494
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
9595

96-
After this, running `print_model_size(model_static_quantized)` shows the static quantized model is `3.98MB`.
96+
μ΄λ‹€μŒμ— `print_model_size(model_static_quantized)` λ₯Ό μ‹€ν–‰ν•˜λ©΄ 정적 μ–‘μžν™”κ°€ 적용된 λͺ¨λΈμ΄ `3.98MB` 라 ν‘œμ‹œλ©λ‹ˆλ‹€.
9797

98-
A complete model definition and static quantization example is `here <https://pytorch.org/docs/stable/quantization.html#quantization-api-summary>`_. A dedicated static quantization tutorial is `here <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html>`_.
98+
λͺ¨λΈμ˜ 전체 μ •μ˜μ™€ 정적 μ–‘μžν™”μ˜ μ˜ˆμ œλŠ” `μ—¬κΈ° <https://pytorch.org/docs/stable/quantization.html#quantization-api-summary>`_ μ—μ„œ ν™•μΈν•˜μ„Έμš”. νŠΉμˆ˜ν•œ 정적 μ–‘μžν™” νŠœν† λ¦¬μ–Όμ€ `μ—¬κΈ° <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html>`_ μ—μ„œ ν™•μΈν•˜μ„Έμš”.
9999

100100
.. note::
101-
To make the model run on mobile devices which normally have arm architecture, you need to use `qnnpack` for `backend`; to run the model on computer with x86 architecture, use `fbgemm`.
101+
λͺ¨λ°”일 μž₯λΉ„λŠ” 일반적으둜 ARM μ•„ν‚€ν…μ²˜λ₯Ό νƒ‘μž¬ν•˜λŠ”λ° μ—¬κΈ°μ„œ λͺ¨λΈμ΄ μž‘λ™ν•˜κ²Œ ν•˜λ €λ©΄, `qnnpack` 을 `backend` 둜 μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€. 이와 달리 x86 μ•„ν‚€ν…μ²˜λ₯Ό νƒ‘μž¬ν•œ μ»΄ν“¨ν„°μ—μ„œ λͺ¨λΈμ΄ μž‘λ™ν•˜κ²Œ ν•˜λ €λ©΄, `fbgemm` 을 `backend` 둜 μ‚¬μš©ν•˜μ„Έμš”.
102102

103-
4. Quantization Aware Training
104-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
103+
4. μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅ (Quantization Aware Training)
104+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105105

106-
Quantization aware training inserts fake quantization to all the weights and activations during the model training process and results in higher inference accuracy than the post-training quantization methods. It is typically used in CNN models.
106+
μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ€ λͺ¨λΈ ν•™μŠ΅ κ³Όμ •μ—μ„œ λͺ¨λ“  κ°€μ€‘μΉ˜μ™€ ν™œμ„± ν•¨μˆ˜μ— κ°€μ§œ μ–‘μžν™”λ₯Ό μ‚½μž…ν•˜κ²Œ 되고, ν•™μŠ΅ ν›„ μ–‘μžν™”ν•˜λŠ” 방법보닀 높은 μΆ”λ‘  정확도λ₯Ό κ°€μ§‘λ‹ˆλ‹€. μ΄λŠ” 주둜 CNN λͺ¨λΈμ— μ‚¬μš©λ©λ‹ˆλ‹€.
107107

108-
To enable a model for quantization aware traing, define in the `__init__` method of the model definition a `QuantStub` and a `DeQuantStub` to convert tensors from floating point to quantized type and vice versa:
108+
λͺ¨λΈμ„ μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ„ κ°€λŠ₯ν•˜κ²Œ ν•˜λ €λ©΄, λͺ¨λΈ μ •μ˜ λΆ€λΆ„μ˜ `__init__` λ©”μ†Œλ“œμ—μ„œ `QuantStub` κ³Ό `DeQuantStub` 을 μ •μ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€. 이듀은 각각 tensorλ₯Ό μ‹€μˆ˜ν˜•μ—μ„œ μ–‘μžν™”λœ μžλ£Œν˜•μœΌλ‘œ μ „ν™˜ν•˜κ±°λ‚˜ λ°˜λŒ€λ‘œ μ „ν™˜ν•˜λŠ” μ—­ν• μž…λ‹ˆλ‹€.
109109

110110
::
111111

112112
self.quant = torch.quantization.QuantStub()
113113
self.dequant = torch.quantization.DeQuantStub()
114114

115-
Then in the beginning and the end of the `forward` method of the model definition, call `x = self.quant(x)` and `x = self.dequant(x)`.
115+
κ·Έλ‹€μŒ, λͺ¨λΈ μ •μ˜ λΆ€λΆ„μ˜ `forward` λ©”μ†Œλ“œμ˜ μ‹œμž‘ λΆ€λΆ„κ³Ό λλΆ€λΆ„μ—μ„œ, `x = self.quant(x)` 와 `x = self.dequant(x)` λ₯Ό ν˜ΈμΆœν•˜μ„Έμš”.
116116

117-
To do a quantization aware training, use the following code snippet:
117+
μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ„ μ§„ν–‰ν•˜λ €λ©΄, λ‹€μŒμ˜ μ½”λ“œ 쑰각을 μ‚¬μš©ν•˜μ‹­μ‹œμ˜€.
118118

119119
::
120120

121121
model.qconfig = torch.quantization.get_default_qat_qconfig(backend)
122122
model_qat = torch.quantization.prepare_qat(model, inplace=False)
123-
# quantization aware training goes here
123+
# μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ΄ μ—¬κΈ°μ„œ μ§„ν–‰λ©λ‹ˆλ‹€.
124124
model_qat = torch.quantization.convert(model_qat.eval(), inplace=False)
125125

126-
For more detailed examples of the quantization aware training, see `here <https://pytorch.org/docs/master/quantization.html#quantization-aware-training>`_ and `here <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html#quantization-aware-training>`_.
126+
μ–‘μžν™”λ₯Ό κ³ λ €ν•œ ν•™μŠ΅μ˜ 더 μžμ„Έν•œ μ˜ˆμ‹œλŠ” `μ—¬κΈ° <https://pytorch.org/docs/master/quantization.html#quantization-aware-training>`_ 와 `μ—¬κΈ° <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html#quantization-aware-training>`_ λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
127127

128-
A pre-trained quantized model can also be used for quantized aware transfer learning, using the same `quant` and `dequant` calls shown above. See `here <https://tutorials.pytorch.kr/intermediate/quantized_transfer_learning_tutorial.html#part-1-training-a-custom-classifier-based-on-a-quantized-feature-extractor>`_ for a complete example.
128+
사전 ν•™μŠ΅λœ μ–‘μžν™” 적용 λͺ¨λΈλ„ μ–‘μžν™”λ₯Ό κ³ λ €ν•œ 전이 ν•™μŠ΅μ— μ‚¬μš©λ  수 μžˆμŠ΅λ‹ˆλ‹€. μ΄λ•Œλ„ μœ„μ—μ„œ μ‚¬μš©ν•œ `quant` 와 `dequant` λ₯Ό λ˜‘κ°™μ΄ μ‚¬μš©ν•©λ‹ˆλ‹€. 전체 μ˜ˆμ œλŠ” `μ—¬κΈ° <https://tutorials.pytorch.kr/intermediate/quantized_transfer_learning_tutorial.html#part-1-training-a-custom-classifier-based-on-a-quantized-feature-extractor>`_ λ₯Ό ν™•μΈν•˜μ„Έμš”.
129129

130-
After a quantized model is generated using one of the steps above, before the model can be used to run on mobile devices, it needs to be further converted to the `TorchScript` format and then optimized for mobile apps. See the `Script and Optimize for Mobile recipe <script_optimized.html>`_ for details.
130+
μœ„μ˜ 단계 쀑 ν•˜λ‚˜λ₯Ό μ΄μš©ν•΄ μ–‘μžν™”λœ λͺ¨λΈμ΄ μƒμ„±λœ 후에, λͺ¨λ°”일 μž₯μΉ˜μ—μ„œ μž‘λ™λ˜κ²Œ ν•˜λ €λ©΄ μΆ”κ°€λ‘œ `TorchScript` ν˜•μ‹μœΌλ‘œ μ „ν™˜ν•˜κ³  λͺ¨λ°”일 app에 μ΅œμ ν™”λ₯Ό μ§„ν–‰ν•΄μ•Ό ν•©λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ `Script and Optimize for Mobile recipe <script_optimized.html>`_ λ₯Ό ν™•μΈν•˜μ„Έμš”.
131131

132-
Learn More
133-
-----------------
132+
더 μ•Œμ•„λ³΄κΈ°
133+
----------
134134

135-
For more info on the different workflows of quantization, see `here <https://pytorch.org/docs/stable/quantization.html#quantization-workflows>`_ and `here <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#post-training-static-quantization>`_.
135+
λ‹€λ₯Έ μ–‘μžν™” μ μš©λ²•μ— λŒ€ν•œ μΆ”κ°€ μ •λ³΄λŠ” `μ—¬κΈ° <https://pytorch.org/docs/stable/quantization.html#quantization-workflows>`_ 와 `μ—¬κΈ° <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#post-training-static-quantization>`_ λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.

0 commit comments

Comments
Β (0)