Skip to content

Commit 47c52c4

Browse files
authored
translate hyperparameter_tuning_tutorial.py (#595)
* translate hyperparameter_tuning_tutorial.py
1 parent a3819d6 commit 47c52c4

1 file changed

Lines changed: 76 additions & 115 deletions

File tree

โ€Žbeginner_source/hyperparameter_tuning_tutorial.pyโ€Ž

Lines changed: 76 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,29 @@
11
# -*- coding: utf-8 -*-
22
"""
3-
Hyperparameter tuning with Ray Tune
3+
Ray Tune์„ ์ด์šฉํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹
44
===================================
5-
6-
Hyperparameter tuning can make the difference between an average model and a highly
7-
accurate one. Often simple things like choosing a different learning rate or changing
8-
a network layer size can have a dramatic impact on your model performance.
9-
10-
Fortunately, there are tools that help with finding the best combination of parameters.
11-
`Ray Tune <https://docs.ray.io/en/latest/tune.html>`_ is an industry standard tool for
12-
distributed hyperparameter tuning. Ray Tune includes the latest hyperparameter search
13-
algorithms, integrates with TensorBoard and other analysis libraries, and natively
14-
supports distributed training through `Ray's distributed machine learning engine
15-
<https://ray.io/>`_.
16-
17-
In this tutorial, we will show you how to integrate Ray Tune into your PyTorch
18-
training workflow. We will extend `this tutorial from the PyTorch documentation
19-
<https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_ for training
20-
a CIFAR10 image classifier.
21-
22-
As you will see, we only need to add some slight modifications. In particular, we
23-
need to
24-
25-
1. wrap data loading and training in functions,
26-
2. make some network parameters configurable,
27-
3. add checkpointing (optional),
28-
4. and define the search space for the model tuning
29-
5+
**๋ฒˆ์—ญ**: `์‹ฌํ˜•์ค€ <http://github.com/95hj>`_
6+
ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์€ ๋ณดํ†ต์˜ ๋ชจ๋ธ๊ณผ ๋งค์šฐ ์ •ํ™•ํ•œ ๋ชจ๋ธ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
7+
์ข…์ข… ๋‹ค๋ฅธ ํ•™์Šต๋ฅ (Learnig rate)์„ ์„ ํƒํ•˜๊ฑฐ๋‚˜ layer size๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ๊ฐ„๋‹จํ•œ ์ž‘์—…๋งŒ์œผ๋กœ๋„ ๋ชจ๋ธ ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
8+
๋‹คํ–‰ํžˆ, ์ตœ์ ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ ์กฐํ•ฉ์„ ์ฐพ๋Š”๋ฐ ๋„์›€์ด ๋˜๋Š” ๋„๊ตฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
9+
`Ray Tune <https://docs.ray.io/en/latest/tune.html>`_ ์€ ๋ถ„์‚ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์„ ์œ„ํ•œ ์—…๊ณ„ ํ‘œ์ค€ ๋„๊ตฌ์ž…๋‹ˆ๋‹ค.
10+
Ray Tune์€ ์ตœ์‹  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํฌํ•จํ•˜๊ณ  TensorBoard ๋ฐ ๊ธฐํƒ€ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ํ†ตํ•ฉ๋˜๋ฉฐ ๊ธฐ๋ณธ์ ์œผ๋กœ
11+
`Ray' ์˜ ๋ถ„์‚ฐ ๊ธฐ๊ณ„ ํ•™์Šต ์—”์ง„
12+
<https://ray.io/>`_ ์„ ํ†ตํ•ด ๊ต์œก์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
13+
์ด ํŠœํ† ๋ฆฌ์–ผ์€ Ray Tune์„ ํŒŒ์ดํ† ์น˜ ํ•™์Šต workflow์— ํ†ตํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ๋ ค์ค๋‹ˆ๋‹ค.
14+
CIFAR10 ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด `ํŒŒ์ดํ† ์น˜ ๋ฌธ์„œ์—์„œ ์ด ํŠœํ† ๋ฆฌ์–ผ์„ <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_ ํ™•์žฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
15+
์•„๋ž˜์™€ ๊ฐ™์ด ์•ฝ๊ฐ„์˜ ์ˆ˜์ •๋งŒ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
16+
1. ํ•จ์ˆ˜์—์„œ ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ ๋ฐ ํ•™์Šต ๋ถ€๋ถ„์„ ๊ฐ์‹ธ๋‘๊ณ ,
17+
2. ์ผ๋ถ€ ๋„คํŠธ์›Œํฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ ,
18+
3. ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  (์„ ํƒ ์‚ฌํ•ญ),
19+
4. ๋ชจ๋ธ ํŠœ๋‹์„ ์œ„ํ•œ ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
3020
|
31-
32-
To run this tutorial, please make sure the following packages are
33-
installed:
34-
35-
- ``ray[tune]``: Distributed hyperparameter tuning library
36-
- ``torchvision``: For the data transformers
37-
38-
Setup / Imports
21+
์ด ํŠœํ† ๋ฆฌ์–ผ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜์˜ ํŒจํ‚ค์ง€๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
22+
- ``ray[tune]``: ๋ฐฐํฌ๋œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
23+
- ``torchvision``: ๋ฐ์ดํ„ฐ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ๊ฒฝ์šฐ
24+
์„ค์ • / Imports
3925
---------------
40-
Let's start with the imports:
26+
import๋“ค๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
4127
"""
4228
from functools import partial
4329
import numpy as np
@@ -54,13 +40,13 @@
5440
from ray.tune.schedulers import ASHAScheduler
5541

5642
######################################################################
57-
# Most of the imports are needed for building the PyTorch model. Only the last three
58-
# imports are for Ray Tune.
43+
# ๋Œ€๋ถ€๋ถ„์˜ import๋“ค์€ ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ์„ ๋นŒ๋“œํ•˜๋Š”๋ฐ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
44+
# ๋งˆ์ง€๋ง‰ ์„ธ ๊ฐœ์˜ import๋“ค๋งŒ Ray Tune์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
5945
#
6046
# Data loaders
6147
# ------------
62-
# We wrap the data loaders in their own function and pass a global data directory.
63-
# This way we can share a data directory between different trials.
48+
# data loader๋ฅผ ์ž์ฒด ํ•จ์ˆ˜๋กœ ๊ฐ์‹ธ๋‘๊ณ  ์ „์—ญ ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
49+
# ์ด๋Ÿฐ ์‹์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅธ ์‹คํ—˜๋“ค ๊ฐ„์— ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
6450

6551

6652
def load_data(data_dir="./data"):
@@ -78,10 +64,10 @@ def load_data(data_dir="./data"):
7864
return trainset, testset
7965

8066
######################################################################
81-
# Configurable neural network
67+
# ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ ์‹ ๊ฒฝ๋ง
8268
# ---------------------------
83-
# We can only tune those parameters that are configurable. In this example, we can specify
84-
# the layer sizes of the fully connected layers:
69+
# ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ํŠœ๋‹์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
70+
# ์ด ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด fully connected layer ํฌ๊ธฐ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
8571

8672

8773
class Net(nn.Module):
@@ -104,15 +90,15 @@ def forward(self, x):
10490
return x
10591

10692
######################################################################
107-
# The train function
93+
# ํ•™์Šต ํ•จ์ˆ˜
10894
# ------------------
109-
# Now it gets interesting, because we introduce some changes to the example `from the PyTorch
110-
# documentation <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_.
95+
# ํฅ๋ฏธ๋กญ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด `ํŒŒ์ดํ† ์น˜ ๋ฌธ์„œ์—์„œ <https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html>`_
96+
# ์˜ˆ์ œ์— ์ผ๋ถ€๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
11197
#
112-
# We wrap the training script in a function ``train_cifar(config, checkpoint_dir=None, data_dir=None)``.
113-
# As you can guess, the ``config`` parameter will receive the hyperparameters we would like to
114-
# train with. The ``checkpoint_dir`` parameter is used to restore checkpoints. The ``data_dir`` specifies
115-
# the directory where we load and store the data, so multiple runs can share the same data source.
98+
# ํ›ˆ๋ จ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ``train_cifar(config, checkpoint_dir=None, data_dir=None)`` ํ•จ์ˆ˜๋กœ ๊ฐ์‹ธ๋‘ก๋‹ˆ๋‹ค.
99+
# ์ง์ž‘ํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ``config`` ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ํ›ˆ๋ จํ•  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. ``checkpoint_dir`` ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ฒดํฌํฌ์ธํŠธ๋ฅผ
100+
# ๋ณต์›ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ``data_dir`` ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์ €์žฅํ•˜๋Š” ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ง€์ •ํ•˜๋ฏ€๋กœ,
101+
# ์—ฌ๋Ÿฌ ์‹คํ–‰๋“ค์ด ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
116102
#
117103
# .. code-block:: python
118104
#
@@ -124,21 +110,19 @@ def forward(self, x):
124110
# net.load_state_dict(model_state)
125111
# optimizer.load_state_dict(optimizer_state)
126112
#
127-
# The learning rate of the optimizer is made configurable, too:
113+
# ๋˜ํ•œ, ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ํ•™์Šต๋ฅ (learning rate)์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
128114
#
129115
# .. code-block:: python
130116
#
131117
# optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
132118
#
133-
# We also split the training data into a training and validation subset. We thus train on
134-
# 80% of the data and calculate the validation loss on the remaining 20%. The batch sizes
135-
# with which we iterate through the training and test sets are configurable as well.
119+
# ๋˜ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต ๋ฐ ๊ฒ€์ฆ ์„ธํŠธ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ์˜ 80%๋Š” ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๊ณ ,
120+
# ๋‚˜๋จธ์ง€ 20%์— ๋Œ€ํ•ด ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ ๋ฐ ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฐฐ์น˜ ํฌ๊ธฐ๋„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
136121
#
137-
# Adding (multi) GPU support with DataParallel
122+
# DataParallel์„ ์ด์šฉํ•œ GPU(๋‹ค์ค‘)์ง€์› ์ถ”๊ฐ€
138123
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139-
# Image classification benefits largely from GPUs. Luckily, we can continue to use
140-
# PyTorch's abstractions in Ray Tune. Thus, we can wrap our model in ``nn.DataParallel``
141-
# to support data parallel training on multiple GPUs:
124+
# ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋Š” GPU๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ์ด์ ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ์šด์ข‹๊ฒŒ๋„ Ray Tune์—์„œ ํŒŒ์ดํ† ์น˜์˜ ์ถ”์ƒํ™”๋ฅผ ๊ณ„์† ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
125+
# ๋”ฐ๋ผ์„œ ์—ฌ๋Ÿฌ GPU์—์„œ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ํ›ˆ๋ จ์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์„ ``nn.DataParallel`` ์œผ๋กœ ๊ฐ์Œ€ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
142126
#
143127
# .. code-block:: python
144128
#
@@ -149,25 +133,23 @@ def forward(self, x):
149133
# net = nn.DataParallel(net)
150134
# net.to(device)
151135
#
152-
# By using a ``device`` variable we make sure that training also works when we have
153-
# no GPUs available. PyTorch requires us to send our data to the GPU memory explicitly,
154-
# like this:
136+
# ``device`` ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ GPU๊ฐ€ ์—†์„ ๋•Œ๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•œ์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
137+
# ํŒŒ์ดํ† ์น˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ GPU๋ฉ”๋ชจ๋ฆฌ์— ๋ช…์‹œ์ ์œผ๋กœ ๋ณด๋‚ด๋„๋ก ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.
155138
#
156139
# .. code-block:: python
157140
#
158141
# for i, data in enumerate(trainloader, 0):
159142
# inputs, labels = data
160143
# inputs, labels = inputs.to(device), labels.to(device)
161144
#
162-
# The code now supports training on CPUs, on a single GPU, and on multiple GPUs. Notably, Ray
163-
# also supports `fractional GPUs <https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus>`_
164-
# so we can share GPUs among trials, as long as the model still fits on the GPU memory. We'll come back
165-
# to that later.
145+
# ์ด ์ฝ”๋“œ๋Š” ์ด์ œ CPU๋“ค, ๋‹จ์ผ GPU ๋ฐ ๋‹ค์ค‘ GPU์— ๋Œ€ํ•œ ํ•™์Šต์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
146+
# ํŠนํžˆ Ray๋Š” `๋ถ€๋ถ„GPU <https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus>`_ ๋„ ์ง€์›ํ•˜๋ฏ€๋กœ
147+
# ๋ชจ๋ธ์ด GPU ๋ฉ”๋ชจ๋ฆฌ์— ์ ํ•ฉํ•œ ์ƒํ™ฉ์—์„œ๋Š” ํ…Œ์ŠคํŠธ ๊ฐ„์— GPU๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‚˜์ค‘์— ๋‹ค๋ฃฐ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
166148
#
167-
# Communicating with Ray Tune
149+
# Ray Tune๊ณผ ์†Œํ†ตํ•˜๊ธฐ
168150
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
169151
#
170-
# The most interesting part is the communication with Ray Tune:
152+
# ๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ๋ถ€๋ถ„์€ Ray Tune๊ณผ์˜ ์†Œํ†ต์ž…๋‹ˆ๋‹ค.
171153
#
172154
# .. code-block:: python
173155
#
@@ -177,22 +159,17 @@ def forward(self, x):
177159
#
178160
# tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
179161
#
180-
# Here we first save a checkpoint and then report some metrics back to Ray Tune. Specifically,
181-
# we send the validation loss and accuracy back to Ray Tune. Ray Tune can then use these metrics
182-
# to decide which hyperparameter configuration lead to the best results. These metrics
183-
# can also be used to stop bad performing trials early in order to avoid wasting
184-
# resources on those trials.
162+
# ์—ฌ๊ธฐ์„œ ๋จผ์ € ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•œ ๋‹ค์Œ ์ผ๋ถ€ ๋ฉ”ํŠธ๋ฆญ์„ Ray Tune์— ๋‹ค์‹œ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ํŠนํžˆ, validation loss์™€ accuracy๋ฅผ
163+
# Ray Tune์œผ๋กœ ๋‹ค์‹œ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ๊ทธ ํ›„ Ray Tune์€ ์ด๋Ÿฌํ•œ ๋ฉ”ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ ๋„ํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๊ตฌ์„ฑ์„
164+
# ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฉ”ํŠธ๋ฆญ๋“ค์€ ๋˜ํ•œ ๋ฆฌ์†Œ์Šค ๋‚ญ๋น„๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์€ ์‹คํ—˜์„ ์กฐ๊ธฐ์— ์ค‘์ง€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
185165
#
186-
# The checkpoint saving is optional, however, it is necessary if we wanted to use advanced
187-
# schedulers like
188-
# `Population Based Training <https://docs.ray.io/en/master/tune/tutorials/tune-advanced-tutorial.html>`_.
189-
# Also, by saving the checkpoint we can later load the trained models and validate them
190-
# on a test set.
166+
# ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ์€ ์„ ํƒ์‚ฌํ•ญ์ด์ง€๋งŒ `Population Based Training <https://docs.ray.io/en/master/tune/tutorials/tune-advanced-tutorial.html>`_
167+
# ๊ณผ ๊ฐ™์€ ๊ณ ๊ธ‰ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๋ฉด ๋‚˜์ค‘์— ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ํ‰๊ฐ€ ์„ธํŠธ(test set)์—์„œ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
191168
#
192169
# Full training function
193170
# ~~~~~~~~~~~~~~~~~~~~~~
194171
#
195-
# The full code example looks like this:
172+
# ์ „์ฒด ์ฝ”๋“œ ์˜ˆ์ œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
196173

197174

198175
def train_cifar(config, checkpoint_dir=None, data_dir=None):
@@ -283,13 +260,12 @@ def train_cifar(config, checkpoint_dir=None, data_dir=None):
283260
print("Finished Training")
284261

285262
######################################################################
286-
# As you can see, most of the code is adapted directly from the original example.
263+
# ๋ณด๋‹ค์‹œํ”ผ, ๋Œ€๋ถ€๋ถ„์˜ ์ฝ”๋“œ๋Š” ์›๋ณธ ์˜ˆ์ œ์—์„œ ์ง์ ‘ ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
287264
#
288-
# Test set accuracy
265+
# Test set ์ •ํ™•๋„(accuracy)
289266
# -----------------
290-
# Commonly the performance of a machine learning model is tested on a hold-out test
291-
# set with data that has not been used for training the model. We also wrap this in a
292-
# function:
267+
# ์ผ๋ฐ˜์ ์œผ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.
268+
# Test set ๋˜ํ•œ ํ•จ์ˆ˜๋กœ ๊ฐ์‹ธ๋‘˜ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
293269

294270

295271
def test_accuracy(net, device="cpu"):
@@ -312,12 +288,11 @@ def test_accuracy(net, device="cpu"):
312288
return correct / total
313289

314290
######################################################################
315-
# The function also expects a ``device`` parameter, so we can do the
316-
# test set validation on a GPU.
291+
# ์ด ํ•จ์ˆ˜๋Š” ๋˜ํ•œ ``device`` ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์š”๊ตฌํ•˜๋ฏ€๋กœ, test set ํ‰๊ฐ€๋ฅผ GPU์—์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
317292
#
318-
# Configuring the search space
293+
# ๊ฒ€์ƒ‰ ๊ณต๊ฐ„ ๊ตฌ์„ฑ
319294
# ----------------------------
320-
# Lastly, we need to define Ray Tune's search space. Here is an example:
295+
# ๋งˆ์ง€๋ง‰์œผ๋กœ Ray Tune์˜ ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์„ ์ •์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
321296
#
322297
# .. code-block:: python
323298
#
@@ -328,20 +303,14 @@ def test_accuracy(net, device="cpu"):
328303
# "batch_size": tune.choice([2, 4, 8, 16])
329304
# }
330305
#
331-
# The ``tune.sample_from()`` function makes it possible to define your own sample
332-
# methods to obtain hyperparameters. In this example, the ``l1`` and ``l2`` parameters
333-
# should be powers of 2 between 4 and 256, so either 4, 8, 16, 32, 64, 128, or 256.
334-
# The ``lr`` (learning rate) should be uniformly sampled between 0.0001 and 0.1. Lastly,
335-
# the batch size is a choice between 2, 4, 8, and 16.
306+
# ``tune.sample_from()`` ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ณ ์œ ํ•œ ์ƒ˜ํ”Œ ๋ฐฉ๋ฒ•์„ ์ •์˜ํ•˜์—ฌ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
307+
# ์ด ์˜ˆ์ œ์—์„œ ``l1`` ๊ณผ ``l2`` ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 4์™€ 256 ์‚ฌ์ด์˜ 2์˜ ๊ฑฐ๋“ญ์ œ๊ณฑ์ด์–ด์•ผ ํ•˜๋ฏ€๋กœ 4, 8, 16, 32, 64, 128, 256์ž…๋‹ˆ๋‹ค.
308+
# ``lr`` (ํ•™์Šต๋ฅ )์€ 0.0001๊ณผ 0.1 ์‚ฌ์ด์—์„œ ๊ท ์ผํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง ๋˜์–ด์•„ ํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” 2, 4, 8, 16์ค‘์—์„œ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
336309
#
337-
# At each trial, Ray Tune will now randomly sample a combination of parameters from these
338-
# search spaces. It will then train a number of models in parallel and find the best
339-
# performing one among these. We also use the ``ASHAScheduler`` which will terminate bad
340-
# performing trials early.
310+
# ๊ฐ ์‹คํ—˜์—์„œ, Ray Tune์€ ์ด์ œ ์ด๋Ÿฌํ•œ ๊ฒ€์ƒ‰ ๊ณต๊ฐ„์—์„œ ๋งค๊ฐœ๋ณ€์ˆ˜ ์กฐํ•ฉ์„ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
311+
# ๊ทธ๋Ÿฐ ๋‹ค์Œ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ๋ณ‘๋ ฌ๋กœ ํ›ˆ๋ จํ•˜๊ณ  ์ด ์ค‘์—์„œ ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์„ ์ฐพ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์€ ์‹คํ—˜์„ ์กฐ๊ธฐ์— ์ข…๋ฃŒํ•˜๋Š” ``ASHAScheduler`` ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
341312
#
342-
# We wrap the ``train_cifar`` function with ``functools.partial`` to set the constant
343-
# ``data_dir`` parameter. We can also tell Ray Tune what resources should be
344-
# available for each trial:
313+
# ์ƒ์ˆ˜ ``data_dir`` ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•˜๊ธฐ ์œ„ํ•ด ``functools.partial`` ๋กœ ``train_cifar`` ํ•จ์ˆ˜๋ฅผ ๊ฐ์‹ธ๋‘ก๋‹ˆ๋‹ค. ๋˜ํ•œ ๊ฐ ์‹คํ—˜์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ž์›๋“ค(resources)์„ Ray Tune์— ์•Œ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
345314
#
346315
# .. code-block:: python
347316
#
@@ -356,21 +325,14 @@ def test_accuracy(net, device="cpu"):
356325
# progress_reporter=reporter,
357326
# checkpoint_at_end=True)
358327
#
359-
# You can specify the number of CPUs, which are then available e.g.
360-
# to increase the ``num_workers`` of the PyTorch ``DataLoader`` instances. The selected
361-
# number of GPUs are made visible to PyTorch in each trial. Trials do not have access to
362-
# GPUs that haven't been requested for them - so you don't have to care about two trials
363-
# using the same set of resources.
328+
# ํŒŒ์ดํ† ์น˜ ``DataLoader`` ์ธ์Šคํ„ด์Šค์˜ ``num_workers`` ์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด CPU ์ˆ˜๋ฅผ ์ง€์ •ํ•˜๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
329+
# ๊ฐ ์‹คํ—˜์—์„œ ์„ ํƒํ•œ ์ˆ˜์˜ GPU๋“ค์€ ํŒŒ์ดํ† ์น˜์— ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์‹คํ—˜๋“ค์€ ์š”์ฒญ๋˜์ง€ ์•Š์€ GPU์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ๊ฐ™์€ ์ž์›๋“ค์„ ์‚ฌ์šฉํ•˜๋Š” ์ค‘๋ณต๋œ ์‹คํ—˜์— ๋Œ€ํ•ด ์‹ ๊ฒฝ์“ฐ์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค.
364330
#
365-
# Here we can also specify fractional GPUs, so something like ``gpus_per_trial=0.5`` is
366-
# completely valid. The trials will then share GPUs among each other.
367-
# You just have to make sure that the models still fit in the GPU memory.
331+
# ๋ถ€๋ถ„ GPUs๋ฅผ ์ง€์ •ํ•  ์ˆ˜๋„ ์žˆ์œผ๋ฏ€๋กœ, ``gpus_per_trial=0.5`` ์™€ ๊ฐ™์€ ๊ฒƒ ๋˜ํ•œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ ๊ฐ ์‹คํ—˜์€ GPU๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋ชจ๋ธ์ด ์—ฌ์ „ํžˆ GPU๋ฉ”๋ชจ๋ฆฌ์— ์ ํ•ฉํ•œ์ง€๋งŒ ํ™•์ธํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
368332
#
369-
# After training the models, we will find the best performing one and load the trained
370-
# network from the checkpoint file. We then obtain the test set accuracy and report
371-
# everything by printing.
333+
# ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚จ ํ›„, ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์„ ์ฐพ๊ณ  ์ฒดํฌํฌ์ธํŠธ ํŒŒ์ผ์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ test set ์ •ํ™•๋„(accuracy)๋ฅผ ์–ป๊ณ  ๋ชจ๋“  ๊ฒƒ๋“ค์„ ์ถœ๋ ฅํ•˜์—ฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
372334
#
373-
# The full main function looks like this:
335+
# ์ „์ฒด ์ฃผ์š” ๊ธฐ๋Šฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
374336

375337

376338
def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
@@ -429,7 +391,7 @@ def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
429391

430392

431393
######################################################################
432-
# If you run the code, an example output could look like this:
394+
# ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
433395
#
434396
# ::
435397
#
@@ -455,8 +417,7 @@ def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
455417
# Best trial final validation accuracy: 0.5836
456418
# Best trial test set accuracy: 0.5806
457419
#
458-
# Most trials have been stopped early in order to avoid wasting resources.
459-
# The best performing trial achieved a validation accuracy of about 58%, which could
460-
# be confirmed on the test set.
420+
# ๋Œ€๋ถ€๋ถ„์˜ ์‹คํ—˜์€ ์ž์› ๋‚ญ๋น„๋ฅผ ๋ง‰๊ธฐ ์œ„ํ•ด ์ผ์ฐ ์ค‘๋‹จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์€ ์‹คํ—˜์€ 58%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ์ด๋Š” ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
421+
#
422+
# ์ด๊ฒƒ์ด ์ „๋ถ€์ž…๋‹ˆ๋‹ค! ์ด์ œ ํŒŒ์ดํ† ์น˜ ๋ชจ๋ธ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
461423
#
462-
# So that's it! You can now tune the parameters of your PyTorch models.

0 commit comments

Comments
ย (0)