11# -*- coding: utf-8 -*-
22"""
3- Hyperparameter tuning with Ray Tune
3+ Ray Tune์ ์ด์ฉํ ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋
44===================================
5-
6- Hyperparameter tuning can make the difference between an average model and a highly
7- accurate one. Often simple things like choosing a different learning rate or changing
8- a network layer size can have a dramatic impact on your model performance.
9-
10- Fortunately, there are tools that help with finding the best combination of parameters.
11- `Ray Tune <https://docs.ray.io/en/latest/tune.html>`_ is an industry standard tool for
12- distributed hyperparameter tuning. Ray Tune includes the latest hyperparameter search
13- algorithms, integrates with TensorBoard and other analysis libraries, and natively
14- supports distributed training through `Ray's distributed machine learning engine
15- <https://ray.io/>`_.
16-
17- In this tutorial, we will show you how to integrate Ray Tune into your PyTorch
18- training workflow. We will extend `this tutorial from the PyTorch documentation
19- <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_ for training
20- a CIFAR10 image classifier.
21-
22- As you will see, we only need to add some slight modifications. In particular, we
23- need to
24-
25- 1. wrap data loading and training in functions,
26- 2. make some network parameters configurable,
27- 3. add checkpointing (optional),
28- 4. and define the search space for the model tuning
29-
5+ **๋ฒ์ญ**: `์ฌํ์ค <http://github.com/95hj>`_
6+ ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋์ ๋ณดํต์ ๋ชจ๋ธ๊ณผ ๋งค์ฐ ์ ํํ ๋ชจ๋ธ๊ฐ์ ์ฐจ์ด๋ฅผ ๋ง๋ค์ด ๋ผ ์ ์์ต๋๋ค.
7+ ์ข
์ข
๋ค๋ฅธ ํ์ต๋ฅ (Learnig rate)์ ์ ํํ๊ฑฐ๋ layer size๋ฅผ ๋ณ๊ฒฝํ๋ ๊ฒ๊ณผ ๊ฐ์ ๊ฐ๋จํ ์์
๋ง์ผ๋ก๋ ๋ชจ๋ธ ์ฑ๋ฅ์ ํฐ ์ํฅ์ ๋ฏธ์น๊ธฐ๋ ํฉ๋๋ค.
8+ ๋คํํ, ์ต์ ์ ๋งค๊ฐ๋ณ์ ์กฐํฉ์ ์ฐพ๋๋ฐ ๋์์ด ๋๋ ๋๊ตฌ๊ฐ ์์ต๋๋ค.
9+ `Ray Tune <https://docs.ray.io/en/latest/tune.html>`_ ์ ๋ถ์ฐ ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋์ ์ํ ์
๊ณ ํ์ค ๋๊ตฌ์
๋๋ค.
10+ Ray Tune์ ์ต์ ํ์ดํผํ๋ผ๋ฏธํฐ ๊ฒ์ ์๊ณ ๋ฆฌ์ฆ์ ํฌํจํ๊ณ TensorBoard ๋ฐ ๊ธฐํ ๋ถ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ํตํฉ๋๋ฉฐ ๊ธฐ๋ณธ์ ์ผ๋ก
11+ `Ray' ์ ๋ถ์ฐ ๊ธฐ๊ณ ํ์ต ์์ง
12+ <https://ray.io/>`_ ์ ํตํด ๊ต์ก์ ์ง์ํฉ๋๋ค.
13+ ์ด ํํ ๋ฆฌ์ผ์ Ray Tune์ ํ์ดํ ์น ํ์ต workflow์ ํตํฉํ๋ ๋ฐฉ๋ฒ์ ์๋ ค์ค๋๋ค.
14+ CIFAR10 ์ด๋ฏธ์ง ๋ถ๋ฅ๊ธฐ๋ฅผ ํ๋ จํ๊ธฐ ์ํด `ํ์ดํ ์น ๋ฌธ์์์ ์ด ํํ ๋ฆฌ์ผ์ <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_ ํ์ฅํ ๊ฒ์
๋๋ค.
15+ ์๋์ ๊ฐ์ด ์ฝ๊ฐ์ ์์ ๋ง ์ถ๊ฐํ๋ฉด ๋ฉ๋๋ค.
16+ 1. ํจ์์์ ๋ฐ์ดํฐ ๋ก๋ฉ ๋ฐ ํ์ต ๋ถ๋ถ์ ๊ฐ์ธ๋๊ณ ,
17+ 2. ์ผ๋ถ ๋คํธ์ํฌ ํ๋ผ๋ฏธํฐ๋ฅผ ๊ตฌ์ฑ ๊ฐ๋ฅํ๊ฒ ํ๊ณ ,
18+ 3. ์ฒดํฌํฌ์ธํธ๋ฅผ ์ถ๊ฐํ๊ณ (์ ํ ์ฌํญ),
19+ 4. ๋ชจ๋ธ ํ๋์ ์ํ ๊ฒ์ ๊ณต๊ฐ์ ์ ์ํฉ๋๋ค.
3020|
31-
32- To run this tutorial, please make sure the following packages are
33- installed:
34-
35- - ``ray[tune]``: Distributed hyperparameter tuning library
36- - ``torchvision``: For the data transformers
37-
38- Setup / Imports
21+ ์ด ํํ ๋ฆฌ์ผ์ ์คํํ๊ธฐ ์ํด ์๋์ ํจํค์ง๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ญ์์ค.
22+ - ``ray[tune]``: ๋ฐฐํฌ๋ ํ์ดํผํ๋ผ๋ฏธํฐ ํ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ
23+ - ``torchvision``: ๋ฐ์ดํฐ ํธ๋์คํฌ๋จธ์ ๊ฒฝ์ฐ
24+ ์ค์ / Imports
3925---------------
40- Let's start with the imports:
26+ import๋ค๋ก ์์ํฉ๋๋ค.
4127"""
4228from functools import partial
4329import numpy as np
5440from ray .tune .schedulers import ASHAScheduler
5541
5642######################################################################
57- # Most of the imports are needed for building the PyTorch model. Only the last three
58- # imports are for Ray Tune .
43+ # ๋๋ถ๋ถ์ import๋ค์ ํ์ดํ ์น ๋ชจ๋ธ์ ๋น๋ํ๋๋ฐ ํ์ํฉ๋๋ค.
44+ # ๋ง์ง๋ง ์ธ ๊ฐ์ import๋ค๋ง Ray Tune์ ์ฌ์ฉํ๊ธฐ ์ํ ๊ฒ์
๋๋ค .
5945#
6046# Data loaders
6147# ------------
62- # We wrap the data loaders in their own function and pass a global data directory.
63- # This way we can share a data directory between different trials .
48+ # data loader๋ฅผ ์์ฒด ํจ์๋ก ๊ฐ์ธ๋๊ณ ์ ์ญ ๋ฐ์ดํฐ ๋๋ ํ ๋ฆฌ๋ก ์ ๋ฌํฉ๋๋ค.
49+ # ์ด๋ฐ ์์ผ๋ก ์๋ก ๋ค๋ฅธ ์คํ๋ค ๊ฐ์ ๋ฐ์ดํฐ ๋๋ ํ ๋ฆฌ๋ฅผ ๊ณต์ ํ ์ ์์ต๋๋ค .
6450
6551
6652def load_data (data_dir = "./data" ):
@@ -78,10 +64,10 @@ def load_data(data_dir="./data"):
7864 return trainset , testset
7965
8066######################################################################
81- # Configurable neural network
67+ # ๊ตฌ์ฑ ๊ฐ๋ฅํ ์ ๊ฒฝ๋ง
8268# ---------------------------
83- # We can only tune those parameters that are configurable. In this example, we can specify
84- # the layer sizes of the fully connected layers:
69+ # ๊ตฌ์ฑ ๊ฐ๋ฅํ ํ๋ผ๋ฏธํฐ๋ง ํ๋์ด ๊ฐ๋ฅํฉ๋๋ค.
70+ # ์ด ์์๋ฅผ ํตํด fully connected layer ํฌ๊ธฐ๋ฅผ ์ง์ ํ ์ ์์ต๋๋ค.
8571
8672
8773class Net (nn .Module ):
@@ -104,15 +90,15 @@ def forward(self, x):
10490 return x
10591
10692######################################################################
107- # The train function
93+ # ํ์ต ํจ์
10894# ------------------
109- # Now it gets interesting, because we introduce some changes to the example `from the PyTorch
110- # documentation <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_ .
95+ # ํฅ๋ฏธ๋กญ๊ฒ ํ๊ธฐ ์ํด `ํ์ดํ ์น ๋ฌธ์์์ <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_
96+ # ์์ ์ ์ผ๋ถ๋ฅผ ๋ณ๊ฒฝํ์ฌ ์๊ฐํฉ๋๋ค .
11197#
112- # We wrap the training script in a function ``train_cifar(config, checkpoint_dir=None, data_dir=None)``.
113- # As you can guess, the ``config`` parameter will receive the hyperparameters we would like to
114- # train with. The ``checkpoint_dir `` parameter is used to restore checkpoints. The ``data_dir`` specifies
115- # the directory where we load and store the data, so multiple runs can share the same data source .
98+ # ํ๋ จ ์คํฌ๋ฆฝํธ๋ฅผ ``train_cifar(config, checkpoint_dir=None, data_dir=None)`` ํจ์๋ก ๊ฐ์ธ๋ก๋๋ค.
99+ # ์ง์ํ ์ ์๋ฏ์ด, ``config`` ๋งค๊ฐ๋ณ์๋ ํ๋ จํ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ๋ฐ์ต๋๋ค. ``checkpoint_dir`` ๋งค๊ฐ๋ณ์๋ ์ฒดํฌํฌ์ธํธ๋ฅผ
100+ # ๋ณต์ํ๋ ๋ฐ ์ฌ์ฉ๋ฉ๋๋ค. ``data_dir `` ์ ๋ฐ์ดํฐ๋ฅผ ์ฝ๊ณ ์ ์ฅํ๋ ๋๋ ํ ๋ฆฌ๋ฅผ ์ง์ ํ๋ฏ๋ก,
101+ # ์ฌ๋ฌ ์คํ๋ค์ด ๋์ผํ ๋ฐ์ดํฐ ์์ค๋ฅผ ๊ณต์ ํ ์ ์์ต๋๋ค .
116102#
117103# .. code-block:: python
118104#
@@ -124,21 +110,19 @@ def forward(self, x):
124110# net.load_state_dict(model_state)
125111# optimizer.load_state_dict(optimizer_state)
126112#
127- # The learning rate of the optimizer is made configurable, too:
113+ # ๋ํ, ์ตํฐ๋ง์ด์ ์ ํ์ต๋ฅ ( learning rate)์ ๊ตฌ์ฑํ ์ ์์ต๋๋ค.
128114#
129115# .. code-block:: python
130116#
131117# optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
132118#
133- # We also split the training data into a training and validation subset. We thus train on
134- # 80% of the data and calculate the validation loss on the remaining 20%. The batch sizes
135- # with which we iterate through the training and test sets are configurable as well.
119+ # ๋ํ ํ์ต ๋ฐ์ดํฐ๋ฅผ ํ์ต ๋ฐ ๊ฒ์ฆ ์ธํธ๋ก ๋๋๋๋ค. ๋ฐ๋ผ์ ๋ฐ์ดํฐ์ 80%๋ ๋ชจ๋ธ ํ์ต์ ์ฌ์ฉํ๊ณ ,
120+ # ๋๋จธ์ง 20%์ ๋ํด ์ ํจ์ฑ ๊ฒ์ฌ ๋ฐ ์์ค์ ๊ณ์ฐํฉ๋๋ค. ํ์ต ๋ฐ ํ
์คํธ ์ธํธ๋ฅผ ๋ฐ๋ณตํ๋ ๋ฐฐ์น ํฌ๊ธฐ๋ ๊ตฌ์ฑํ ์ ์์ต๋๋ค.
136121#
137- # Adding (multi) GPU support with DataParallel
122+ # DataParallel์ ์ด์ฉํ GPU(๋ค์ค)์ง์ ์ถ๊ฐ
138123# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139- # Image classification benefits largely from GPUs. Luckily, we can continue to use
140- # PyTorch's abstractions in Ray Tune. Thus, we can wrap our model in ``nn.DataParallel``
141- # to support data parallel training on multiple GPUs:
124+ # ์ด๋ฏธ์ง ๋ถ๋ฅ๋ GPU๋ฅผ ์ฌ์ฉํ ๋ ์ด์ ์ด ๋ง์ต๋๋ค. ์ด์ข๊ฒ๋ Ray Tune์์ ํ์ดํ ์น์ ์ถ์ํ๋ฅผ ๊ณ์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
125+ # ๋ฐ๋ผ์ ์ฌ๋ฌ GPU์์ ๋ฐ์ดํฐ ๋ณ๋ ฌ ํ๋ จ์ ์ง์ํ๊ธฐ ์ํด ๋ชจ๋ธ์ ``nn.DataParallel`` ์ผ๋ก ๊ฐ์ ์ ์์ต๋๋ค.
142126#
143127# .. code-block:: python
144128#
@@ -149,25 +133,23 @@ def forward(self, x):
149133# net = nn.DataParallel(net)
150134# net.to(device)
151135#
152- # By using a ``device`` variable we make sure that training also works when we have
153- # no GPUs available. PyTorch requires us to send our data to the GPU memory explicitly,
154- # like this:
136+ # ``device`` ๋ณ์๋ฅผ ์ฌ์ฉํ์ฌ ์ฌ์ฉ ๊ฐ๋ฅํ GPU๊ฐ ์์ ๋๋ ํ์ต์ด ๊ฐ๋ฅํ์ง ํ์ธํฉ๋๋ค.
137+ # ํ์ดํ ์น๋ ๋ค์๊ณผ ๊ฐ์ด ๋ฐ์ดํฐ๋ฅผ GPU๋ฉ๋ชจ๋ฆฌ์ ๋ช
์์ ์ผ๋ก ๋ณด๋ด๋๋ก ์๊ตฌํฉ๋๋ค.
155138#
156139# .. code-block:: python
157140#
158141# for i, data in enumerate(trainloader, 0):
159142# inputs, labels = data
160143# inputs, labels = inputs.to(device), labels.to(device)
161144#
162- # The code now supports training on CPUs, on a single GPU, and on multiple GPUs. Notably, Ray
163- # also supports `fractional GPUs <https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus>`_
164- # so we can share GPUs among trials, as long as the model still fits on the GPU memory. We'll come back
165- # to that later.
145+ # ์ด ์ฝ๋๋ ์ด์ CPU๋ค, ๋จ์ผ GPU ๋ฐ ๋ค์ค GPU์ ๋ํ ํ์ต์ ์ง์ํฉ๋๋ค.
146+ # ํนํ Ray๋ `๋ถ๋ถGPU <https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus>`_ ๋ ์ง์ํ๋ฏ๋ก
147+ # ๋ชจ๋ธ์ด GPU ๋ฉ๋ชจ๋ฆฌ์ ์ ํฉํ ์ํฉ์์๋ ํ
์คํธ ๊ฐ์ GPU๋ฅผ ๊ณต์ ํ ์ ์์ต๋๋ค. ์ด๋ ๋์ค์ ๋ค๋ฃฐ ๊ฒ์
๋๋ค.
166148#
167- # Communicating with Ray Tune
149+ # Ray Tune๊ณผ ์ํตํ๊ธฐ
168150# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
169151#
170- # The most interesting part is the communication with Ray Tune:
152+ # ๊ฐ์ฅ ํฅ๋ฏธ๋ก์ด ๋ถ๋ถ์ Ray Tune๊ณผ์ ์ํต์
๋๋ค.
171153#
172154# .. code-block:: python
173155#
@@ -177,22 +159,17 @@ def forward(self, x):
177159#
178160# tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
179161#
180- # Here we first save a checkpoint and then report some metrics back to Ray Tune. Specifically,
181- # we send the validation loss and accuracy back to Ray Tune. Ray Tune can then use these metrics
182- # to decide which hyperparameter configuration lead to the best results. These metrics
183- # can also be used to stop bad performing trials early in order to avoid wasting
184- # resources on those trials.
162+ # ์ฌ๊ธฐ์ ๋จผ์ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํ ๋ค์ ์ผ๋ถ ๋ฉํธ๋ฆญ์ Ray Tune์ ๋ค์ ๋ณด๋
๋๋ค. ํนํ, validation loss์ accuracy๋ฅผ
163+ # Ray Tune์ผ๋ก ๋ค์ ๋ณด๋
๋๋ค. ๊ทธ ํ Ray Tune์ ์ด๋ฌํ ๋ฉํธ๋ฆญ์ ์ฌ์ฉํ์ฌ ์ต์์ ๊ฒฐ๊ณผ๋ฅผ ์ ๋ํ๋ ํ์ดํผํ๋ผ๋ฏธํฐ ๊ตฌ์ฑ์
164+ # ๊ฒฐ์ ํ ์ ์์ต๋๋ค. ์ด๋ฌํ ๋ฉํธ๋ฆญ๋ค์ ๋ํ ๋ฆฌ์์ค ๋ญ๋น๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด ์ฑ๋ฅ์ด ์ข์ง ์์ ์คํ์ ์กฐ๊ธฐ์ ์ค์งํ๋ ๋ฐ ์ฌ์ฉํ ์ ์์ต๋๋ค.
185165#
186- # The checkpoint saving is optional, however, it is necessary if we wanted to use advanced
187- # schedulers like
188- # `Population Based Training <https://docs.ray.io/en/master/tune/tutorials/tune-advanced-tutorial.html>`_.
189- # Also, by saving the checkpoint we can later load the trained models and validate them
190- # on a test set.
166+ # ์ฒดํฌํฌ์ธํธ ์ ์ฅ์ ์ ํ์ฌํญ์ด์ง๋ง `Population Based Training <https://docs.ray.io/en/master/tune/tutorials/tune-advanced-tutorial.html>`_
167+ # ๊ณผ ๊ฐ์ ๊ณ ๊ธ ์ค์ผ์ค๋ฌ๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด ํ์ํฉ๋๋ค. ๋ํ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํ๋ฉด ๋์ค์ ํ์ต๋ ๋ชจ๋ธ์ ๋ก๋ํ๊ณ ํ๊ฐ ์ธํธ(test set)์์ ๊ฒ์ฆํ ์ ์์ต๋๋ค.
191168#
192169# Full training function
193170# ~~~~~~~~~~~~~~~~~~~~~~
194171#
195- # The full code example looks like this:
172+ # ์ ์ฒด ์ฝ๋ ์์ ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
196173
197174
198175def train_cifar (config , checkpoint_dir = None , data_dir = None ):
@@ -283,13 +260,12 @@ def train_cifar(config, checkpoint_dir=None, data_dir=None):
283260 print ("Finished Training" )
284261
285262######################################################################
286- # As you can see, most of the code is adapted directly from the original example .
263+ # ๋ณด๋ค์ํผ, ๋๋ถ๋ถ์ ์ฝ๋๋ ์๋ณธ ์์ ์์ ์ง์ ์ ์ฉ๋์์ต๋๋ค .
287264#
288- # Test set accuracy
265+ # Test set ์ ํ๋( accuracy)
289266# -----------------
290- # Commonly the performance of a machine learning model is tested on a hold-out test
291- # set with data that has not been used for training the model. We also wrap this in a
292- # function:
267+ # ์ผ๋ฐ์ ์ผ๋ก ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ์ ์ฑ๋ฅ์ ๋ชจ๋ธ ํ์ต์ ์ฌ์ฉ๋์ง ์์ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ฉํด ํ
์คํธํฉ๋๋ค.
268+ # Test set ๋ํ ํจ์๋ก ๊ฐ์ธ๋ ์ ์์ต๋๋ค.
293269
294270
295271def test_accuracy (net , device = "cpu" ):
@@ -312,12 +288,11 @@ def test_accuracy(net, device="cpu"):
312288 return correct / total
313289
314290######################################################################
315- # The function also expects a ``device`` parameter, so we can do the
316- # test set validation on a GPU.
291+ # ์ด ํจ์๋ ๋ํ ``device`` ํ๋ผ๋ฏธํฐ๋ฅผ ์๊ตฌํ๋ฏ๋ก, test set ํ๊ฐ๋ฅผ GPU์์ ์ํํ ์ ์์ต๋๋ค.
317292#
318- # Configuring the search space
293+ # ๊ฒ์ ๊ณต๊ฐ ๊ตฌ์ฑ
319294# ----------------------------
320- # Lastly, we need to define Ray Tune's search space. Here is an example:
295+ # ๋ง์ง๋ง์ผ๋ก Ray Tune์ ๊ฒ์ ๊ณต๊ฐ์ ์ ์ํด์ผ ํฉ๋๋ค. ์์๋ ์๋์ ๊ฐ์ต๋๋ค.
321296#
322297# .. code-block:: python
323298#
@@ -328,20 +303,14 @@ def test_accuracy(net, device="cpu"):
328303# "batch_size": tune.choice([2, 4, 8, 16])
329304# }
330305#
331- # The ``tune.sample_from()`` function makes it possible to define your own sample
332- # methods to obtain hyperparameters. In this example, the ``l1`` and ``l2`` parameters
333- # should be powers of 2 between 4 and 256, so either 4, 8, 16, 32, 64, 128, or 256.
334- # The ``lr`` (learning rate) should be uniformly sampled between 0.0001 and 0.1. Lastly,
335- # the batch size is a choice between 2, 4, 8, and 16.
306+ # ``tune.sample_from()`` ํจ์๋ฅผ ์ฌ์ฉํ๋ฉด ๊ณ ์ ํ ์ํ ๋ฐฉ๋ฒ์ ์ ์ํ์ฌ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์ป์ ์ ์์ต๋๋ค.
307+ # ์ด ์์ ์์ ``l1`` ๊ณผ ``l2`` ํ๋ผ๋ฏธํฐ๋ 4์ 256 ์ฌ์ด์ 2์ ๊ฑฐ๋ญ์ ๊ณฑ์ด์ด์ผ ํ๋ฏ๋ก 4, 8, 16, 32, 64, 128, 256์
๋๋ค.
308+ # ``lr`` (ํ์ต๋ฅ )์ 0.0001๊ณผ 0.1 ์ฌ์ด์์ ๊ท ์ผํ๊ฒ ์ํ๋ง ๋์ด์ ํฉ๋๋ค. ๋ง์ง๋ง์ผ๋ก, ๋ฐฐ์น ํฌ๊ธฐ๋ 2, 4, 8, 16์ค์์ ์ ํํ ์ ์์ต๋๋ค.
336309#
337- # At each trial, Ray Tune will now randomly sample a combination of parameters from these
338- # search spaces. It will then train a number of models in parallel and find the best
339- # performing one among these. We also use the ``ASHAScheduler`` which will terminate bad
340- # performing trials early.
310+ # ๊ฐ ์คํ์์, Ray Tune์ ์ด์ ์ด๋ฌํ ๊ฒ์ ๊ณต๊ฐ์์ ๋งค๊ฐ๋ณ์ ์กฐํฉ์ ๋ฌด์์๋ก ์ํ๋งํฉ๋๋ค.
311+ # ๊ทธ๋ฐ ๋ค์ ์ฌ๋ฌ ๋ชจ๋ธ์ ๋ณ๋ ฌ๋ก ํ๋ จํ๊ณ ์ด ์ค์์ ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์ ๋ชจ๋ธ์ ์ฐพ์ต๋๋ค. ๋ํ ์ฑ๋ฅ์ด ์ข์ง ์์ ์คํ์ ์กฐ๊ธฐ์ ์ข
๋ฃํ๋ ``ASHAScheduler`` ๋ฅผ ์ฌ์ฉํฉ๋๋ค.
341312#
342- # We wrap the ``train_cifar`` function with ``functools.partial`` to set the constant
343- # ``data_dir`` parameter. We can also tell Ray Tune what resources should be
344- # available for each trial:
313+ # ์์ ``data_dir`` ํ๋ผ๋ฏธํฐ๋ฅผ ์ค์ ํ๊ธฐ ์ํด ``functools.partial`` ๋ก ``train_cifar`` ํจ์๋ฅผ ๊ฐ์ธ๋ก๋๋ค. ๋ํ ๊ฐ ์คํ์ ์ฌ์ฉํ ์ ์๋ ์์๋ค(resources)์ Ray Tune์ ์๋ฆด ์ ์์ต๋๋ค.
345314#
346315# .. code-block:: python
347316#
@@ -356,21 +325,14 @@ def test_accuracy(net, device="cpu"):
356325# progress_reporter=reporter,
357326# checkpoint_at_end=True)
358327#
359- # You can specify the number of CPUs, which are then available e.g.
360- # to increase the ``num_workers`` of the PyTorch ``DataLoader`` instances. The selected
361- # number of GPUs are made visible to PyTorch in each trial. Trials do not have access to
362- # GPUs that haven't been requested for them - so you don't have to care about two trials
363- # using the same set of resources.
328+ # ํ์ดํ ์น ``DataLoader`` ์ธ์คํด์ค์ ``num_workers`` ์ ๋๋ฆฌ๊ธฐ ์ํด CPU ์๋ฅผ ์ง์ ํ๊ณ ์ฌ์ฉํ ์ ์์ต๋๋ค.
329+ # ๊ฐ ์คํ์์ ์ ํํ ์์ GPU๋ค์ ํ์ดํ ์น์ ํ์๋ฉ๋๋ค. ์คํ๋ค์ ์์ฒญ๋์ง ์์ GPU์ ์ก์ธ์คํ ์ ์์ผ๋ฏ๋ก ๊ฐ์ ์์๋ค์ ์ฌ์ฉํ๋ ์ค๋ณต๋ ์คํ์ ๋ํด ์ ๊ฒฝ์ฐ์ง ์์๋ ๋ฉ๋๋ค.
364330#
365- # Here we can also specify fractional GPUs, so something like ``gpus_per_trial=0.5`` is
366- # completely valid. The trials will then share GPUs among each other.
367- # You just have to make sure that the models still fit in the GPU memory.
331+ # ๋ถ๋ถ GPUs๋ฅผ ์ง์ ํ ์๋ ์์ผ๋ฏ๋ก, ``gpus_per_trial=0.5`` ์ ๊ฐ์ ๊ฒ ๋ํ ๊ฐ๋ฅํฉ๋๋ค. ์ดํ ๊ฐ ์คํ์ GPU๋ฅผ ๊ณต์ ํฉ๋๋ค. ์ฌ์ฉ์๋ ๋ชจ๋ธ์ด ์ฌ์ ํ GPU๋ฉ๋ชจ๋ฆฌ์ ์ ํฉํ์ง๋ง ํ์ธํ๋ฉด ๋ฉ๋๋ค.
368332#
369- # After training the models, we will find the best performing one and load the trained
370- # network from the checkpoint file. We then obtain the test set accuracy and report
371- # everything by printing.
333+ # ๋ชจ๋ธ์ ํ๋ จ์ํจ ํ, ๊ฐ์ฅ ์ฑ๋ฅ์ด ์ข์ ๋ชจ๋ธ์ ์ฐพ๊ณ ์ฒดํฌํฌ์ธํธ ํ์ผ์์ ํ์ต๋ ๋ชจ๋ธ์ ๋ก๋ํฉ๋๋ค. ์ดํ test set ์ ํ๋(accuracy)๋ฅผ ์ป๊ณ ๋ชจ๋ ๊ฒ๋ค์ ์ถ๋ ฅํ์ฌ ํ์ธํ ์ ์์ต๋๋ค.
372334#
373- # The full main function looks like this:
335+ # ์ ์ฒด ์ฃผ์ ๊ธฐ๋ฅ์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
374336
375337
376338def main (num_samples = 10 , max_num_epochs = 10 , gpus_per_trial = 2 ):
@@ -429,7 +391,7 @@ def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
429391
430392
431393######################################################################
432- # If you run the code, an example output could look like this:
394+ # ์ฝ๋๋ฅผ ์คํํ๋ฉด ๊ฒฐ๊ณผ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
433395#
434396# ::
435397#
@@ -455,8 +417,7 @@ def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
455417# Best trial final validation accuracy: 0.5836
456418# Best trial test set accuracy: 0.5806
457419#
458- # Most trials have been stopped early in order to avoid wasting resources .
459- # The best performing trial achieved a validation accuracy of about 58%, which could
460- # be confirmed on the test set .
420+ # ๋๋ถ๋ถ์ ์คํ์ ์์ ๋ญ๋น๋ฅผ ๋ง๊ธฐ ์ํด ์ผ์ฐ ์ค๋จ๋์์ต๋๋ค. ๊ฐ์ฅ ์ข์ ๊ฒฐ๊ณผ๋ฅผ ์ป์ ์คํ์ 58%์ ์ ํ๋๋ฅผ ๋ฌ์ฑํ์ผ๋ฉฐ, ์ด๋ ํ
์คํธ ์ธํธ์์ ํ์ธํ ์ ์์ต๋๋ค .
421+ #
422+ # ์ด๊ฒ์ด ์ ๋ถ์
๋๋ค! ์ด์ ํ์ดํ ์น ๋ชจ๋ธ์ ๋งค๊ฐ๋ณ์๋ฅผ ์กฐ์ ํ ์ ์์ต๋๋ค .
461423#
462- # So that's it! You can now tune the parameters of your PyTorch models.
0 commit comments