Skip to content

Commit 4658d55

Browse files
authored
beginner_source/nlp/sequence_models_tutorial.py λ²ˆμ—­ (#780)
* beginner_source/nlp/sequence_models_tutorial.py λ²ˆμ—­
1 parent 0970896 commit 4658d55

1 file changed

Lines changed: 122 additions & 121 deletions

File tree

Lines changed: 122 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,37 @@
11
# -*- coding: utf-8 -*-
22
r"""
3-
Sequence Models and Long Short-Term Memory Networks
3+
μ‹œν€€μŠ€ λͺ¨λΈκ³Ό LSTM λ„€νŠΈμ›Œν¬
44
===================================================
5+
**λ²ˆμ—­**: `λ°•μˆ˜λ―Ό <https://github.com/convin305>`_
56
6-
At this point, we have seen various feed-forward networks. That is,
7-
there is no state maintained by the network at all. This might not be
8-
the behavior we want. Sequence models are central to NLP: they are
9-
models where there is some sort of dependence through time between your
10-
inputs. The classical example of a sequence model is the Hidden Markov
11-
Model for part-of-speech tagging. Another example is the conditional
12-
random field.
13-
14-
A recurrent neural network is a network that maintains some kind of
15-
state. For example, its output could be used as part of the next input,
16-
so that information can propagate along as the network passes over the
17-
sequence. In the case of an LSTM, for each element in the sequence,
18-
there is a corresponding *hidden state* :math:`h_t`, which in principle
19-
can contain information from arbitrary points earlier in the sequence.
20-
We can use the hidden state to predict words in a language model,
21-
part-of-speech tags, and a myriad of other things.
22-
23-
24-
LSTMs in Pytorch
7+
μ§€κΈˆκΉŒμ§€ μš°λ¦¬λŠ” λ‹€μ–‘ν•œ μˆœμ „νŒŒ(feed-forward) 신경망듀을 보아 μ™”μŠ΅λ‹ˆλ‹€.
8+
즉, λ„€νŠΈμ›Œν¬μ— μ˜ν•΄ μœ μ§€λ˜λŠ” μƒνƒœκ°€ μ „ν˜€ μ—†λ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€.
9+
이것은 μ•„λ§ˆ μš°λ¦¬κ°€ μ›ν•˜λŠ” λ™μž‘μ΄ 아닐 μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
10+
μ‹œν€€μŠ€ λͺ¨λΈμ€ NLP의 ν•΅μ‹¬μž…λ‹ˆλ‹€. μ΄λŠ” μž…λ ₯ 간에 μΌμ’…μ˜ μ‹œκ°„μ  쒅속성이 μ‘΄μž¬ν•˜λŠ” λͺ¨λΈμ„ λ§ν•©λ‹ˆλ‹€.
11+
μ‹œν€€μŠ€ λͺ¨λΈμ˜ 고전적인 μ˜ˆλŠ” ν’ˆμ‚¬ νƒœκΉ…μ„ μœ„ν•œ νžˆλ“  마λ₯΄μ½”ν”„ λͺ¨λΈμž…λ‹ˆλ‹€.
12+
또 λ‹€λ₯Έ μ˜ˆλŠ” 쑰건뢀 랜덀 ν•„λ“œμž…λ‹ˆλ‹€.
13+
14+
μˆœν™˜ 신경망은 μΌμ’…μ˜ μƒνƒœλ₯Ό μœ μ§€ν•˜λŠ” λ„€νŠΈμ›Œν¬μž…λ‹ˆλ‹€.
15+
예λ₯Ό λ“€λ©΄, 좜λ ₯은 λ‹€μŒ μž…λ ₯의 μΌλΆ€λ‘œ μ‚¬μš©λ  수 μžˆμŠ΅λ‹ˆλ‹€.
16+
μ •λ³΄λŠ” λ„€νŠΈμ›Œν¬κ°€ μ‹œν€€μŠ€λ₯Ό 톡과할 λ•Œ μ „νŒŒλ  수 μžˆμŠ΅λ‹ˆλ‹€.
17+
LSTM의 κ²½μš°μ—, μ‹œν€€μŠ€μ˜ 각 μš”μ†Œμ— λŒ€μ‘ν•˜λŠ” *은닉 μƒνƒœ(hidden state)* :math:`h_t` κ°€ μ‘΄μž¬ν•˜λ©°,
18+
μ΄λŠ” μ›μΉ™μ μœΌλ‘œ μ‹œν€€μŠ€μ˜ μ•žλΆ€λΆ„μ— μžˆλŠ” μž„μ˜ 포인트의 정보λ₯Ό 포함할 수 μžˆμŠ΅λ‹ˆλ‹€.
19+
μš°λ¦¬λŠ” 은닉 μƒνƒœλ₯Ό μ΄μš©ν•˜μ—¬ μ–Έμ–΄ λͺ¨λΈμ—μ„œμ˜ 단어,
20+
ν’ˆμ‚¬ νƒœκ·Έ λ“± 무수히 λ§Žμ€ 것듀을 μ˜ˆμΈ‘ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
21+
22+
23+
Pytorchμ—μ„œμ˜ LSTM
2524
~~~~~~~~~~~~~~~~~
2625
27-
Before getting to the example, note a few things. Pytorch's LSTM expects
28-
all of its inputs to be 3D tensors. The semantics of the axes of these
29-
tensors is important. The first axis is the sequence itself, the second
30-
indexes instances in the mini-batch, and the third indexes elements of
31-
the input. We haven't discussed mini-batching, so let's just ignore that
32-
and assume we will always have just 1 dimension on the second axis. If
33-
we want to run the sequence model over the sentence "The cow jumped",
34-
our input should look like
26+
예제λ₯Ό μ‹œμž‘ν•˜κΈ° 전에, λͺ‡ κ°€μ§€ 사항을 μœ μ˜ν•˜μ„Έμš”.
27+
Pytorchμ—μ„œμ˜ LSTM은 λͺ¨λ“  μž…λ ₯이 3D Tensor 일 κ²ƒμœΌλ‘œ μ˜ˆμƒν•©λ‹ˆλ‹€.
28+
μ΄λŸ¬ν•œ ν…μ„œ μΆ•μ˜ μ˜λ―ΈλŠ” μ€‘μš”ν•©λ‹ˆλ‹€.
29+
첫 번째 좕은 μ‹œν€€μŠ€ 자체이고, 두 번째 좕은 λ―Έλ‹ˆ 배치의 μΈμŠ€ν„΄μŠ€λ₯Ό μΈλ±μ‹±ν•˜λ©°,
30+
μ„Έ 번째 좕은 μž…λ ₯ μš”μ†Œλ₯Ό μΈλ±μ‹±ν•©λ‹ˆλ‹€.
31+
λ―Έλ‹ˆ λ°°μΉ˜μ— λŒ€ν•΄μ„œλŠ” λ…Όμ˜ν•˜μ§€ μ•Šμ•˜μœΌλ―€λ‘œ 이λ₯Ό λ¬΄μ‹œν•˜κ³ ,
32+
두 번째 좕에 λŒ€ν•΄μ„œλŠ” 항상 1μ°¨μ›λ§Œ κ°€μ§ˆ 것이라고 κ°€μ •ν•˜κ² μŠ΅λ‹ˆλ‹€.
33+
λ§Œμ•½ μš°λ¦¬κ°€ "The cow jumped."λΌλŠ” λ¬Έμž₯에 λŒ€ν•΄ μ‹œν€€μŠ€ λͺ¨λΈμ„ μ‹€ν–‰ν•˜λ €λ©΄,
34+
μž…λ ₯은 λ‹€μŒκ³Ό κ°™μ•„μ•Ό ν•©λ‹ˆλ‹€.
3535
3636
.. math::
3737
@@ -42,12 +42,12 @@
4242
q_\text{jumped}
4343
\end{bmatrix}
4444
45-
Except remember there is an additional 2nd dimension with size 1.
45+
λ‹€λ§Œ, μ‚¬μ΄μ¦ˆκ°€ 1인 좔가적인 2차원이 μžˆλ‹€λŠ” 것을 κΈ°μ–΅ν•΄μ•Ό ν•©λ‹ˆλ‹€.
4646
47-
In addition, you could go through the sequence one at a time, in which
48-
case the 1st axis will have size 1 also.
47+
λ˜ν•œ ν•œ λ²ˆμ— ν•˜λ‚˜μ”© μ‹œν€€μŠ€λ₯Ό μ§„ν–‰ν•  수 있으며,
48+
이 경우 첫 번째 좕도 μ‚¬μ΄μ¦ˆκ°€ 1이 λ©λ‹ˆλ‹€.
4949
50-
Let's see a quick example.
50+
κ°„λ‹¨ν•œ 예λ₯Ό μ‚΄νŽ΄λ³΄κ² μŠ΅λ‹ˆλ‹€.
5151
"""
5252

5353
# Author: Robert Guthrie
@@ -61,95 +61,96 @@
6161

6262
######################################################################
6363

64-
lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3
65-
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5
64+
lstm = nn.LSTM(3, 3) # μž…λ ₯ 3차원, 좜λ ₯ 3차원
65+
inputs = [torch.randn(1, 3) for _ in range(5)] # 길이가 5인 μ‹œν€€μŠ€λ₯Ό λ§Œλ“­λ‹ˆλ‹€
6666

67-
# initialize the hidden state.
67+
# 은닉 μƒνƒœλ₯Ό μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.
6868
hidden = (torch.randn(1, 1, 3),
6969
torch.randn(1, 1, 3))
7070
for i in inputs:
71-
# Step through the sequence one element at a time.
72-
# after each step, hidden contains the hidden state.
71+
# ν•œ λ²ˆμ— ν•œ μš”μ†Œμ”© μ‹œν€€μŠ€λ₯Ό ν†΅κ³Όν•©λ‹ˆλ‹€.
72+
# 각 단계가 λλ‚˜λ©΄, hiddenμ—λŠ” 은닉 μƒνƒœκ°€ ν¬ν•¨λ©λ‹ˆλ‹€.
7373
out, hidden = lstm(i.view(1, 1, -1), hidden)
7474

75-
# alternatively, we can do the entire sequence all at once.
76-
# the first value returned by LSTM is all of the hidden states throughout
77-
# the sequence. the second is just the most recent hidden state
78-
# (compare the last slice of "out" with "hidden" below, they are the same)
79-
# The reason for this is that:
80-
# "out" will give you access to all hidden states in the sequence
81-
# "hidden" will allow you to continue the sequence and backpropagate,
82-
# by passing it as an argument to the lstm at a later time
83-
# Add the extra 2nd dimension
75+
# μ•„λ‹ˆλ©΄ μš°λ¦¬λŠ” 전체 μ‹œν€€μŠ€λ₯Ό ν•œ λ²ˆμ— μˆ˜ν–‰ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
76+
# LSTM에 μ˜ν•΄ λ°˜ν™˜λœ 첫 번째 값은 μ‹œν€€μŠ€ 전체에 λŒ€ν•œ 은닉 μƒνƒœμž…λ‹ˆλ‹€.
77+
# 두 λ²ˆμ§ΈλŠ” κ°€μž₯ 졜근의 은닉 μƒνƒœμž…λ‹ˆλ‹€.
78+
# (μ•„λž˜μ˜ "hidden"κ³Ό "out"의 λ§ˆμ§€λ§‰ 슬라이슀(slice)λ₯Ό 비ꡐ해 보면 λ‘˜μ€ λ™μΌν•©λ‹ˆλ‹€.)
79+
# μ΄λ ‡κ²Œ ν•˜λŠ” μ΄μœ λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
80+
# "out"은 μ‹œν€€μŠ€μ˜ λͺ¨λ“  은닉 μƒνƒœμ— λŒ€ν•œ μ•‘μ„ΈμŠ€λ₯Ό μ œκ³΅ν•˜κ³ ,
81+
# "hidden"은 λ‚˜μ€‘μ— lstm에 인수 ν˜•νƒœλ‘œ μ „λ‹¬ν•˜μ—¬
82+
# μ‹œν€€μŠ€λ₯Ό κ³„μ†ν•˜κ³ , μ—­μ „νŒŒ ν•˜λ„λ‘ ν•©λ‹ˆλ‹€.
83+
# μΆ”κ°€λ‘œ 두 번째 차원을 λ”ν•©λ‹ˆλ‹€.
8484
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
85-
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # clean out hidden state
85+
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3)) # 은닉 μƒνƒœλ₯Ό μ§€μ›λ‹ˆλ‹€.
8686
out, hidden = lstm(inputs, hidden)
8787
print(out)
8888
print(hidden)
8989

9090

9191
######################################################################
92-
# Example: An LSTM for Part-of-Speech Tagging
92+
# μ˜ˆμ‹œ: ν’ˆμ‚¬ νƒœκΉ…μ„ μœ„ν•œ LSTM
9393
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9494
#
95-
# In this section, we will use an LSTM to get part of speech tags. We will
96-
# not use Viterbi or Forward-Backward or anything like that, but as a
97-
# (challenging) exercise to the reader, think about how Viterbi could be
98-
# used after you have seen what is going on. In this example, we also refer
99-
# to embeddings. If you are unfamiliar with embeddings, you can read up
100-
# about them `here <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
95+
# 이 μ„Ήμ…˜μ—μ„œλŠ” μš°λ¦¬λŠ” ν’ˆμ‚¬ νƒœκ·Έλ₯Ό μ–»κΈ° μœ„ν•΄ LSTM을 μ΄μš©ν•  κ²ƒμž…λ‹ˆλ‹€.
96+
# λΉ„ν„°λΉ„(Viterbi)λ‚˜ 순방ν–₯-μ—­λ°©ν–₯(Forward-Backward) 같은 것듀은 μ‚¬μš©ν•˜μ§€ μ•Šμ„ κ²ƒμž…λ‹ˆλ‹€.
97+
# κ·ΈλŸ¬λ‚˜ (도전적인) μ—°μŠ΅μœΌλ‘œ, μ–΄λ–»κ²Œ λŒμ•„κ°€λŠ”μ§€λ₯Ό ν™•μΈν•œ 뒀에
98+
# λΉ„ν„°λΉ„λ₯Ό μ–΄λ–»κ²Œ μ‚¬μš©ν•  수 μžˆλŠ”μ§€μ— λŒ€ν•΄μ„œ 생각해 λ³΄μ‹œκΈ° λ°”λžλ‹ˆλ‹€.
99+
# 이 μ˜ˆμ‹œμ—μ„œλŠ” μž„λ² λ”©λ„ μ°Έμ‘°ν•©λ‹ˆλ‹€. λ§Œμ•½μ— μž„λ² λ”©μ— μ΅μˆ™ν•˜μ§€ μ•Šλ‹€λ©΄,
100+
# `μ—¬κΈ° <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
101+
# μ—μ„œ κ΄€λ ¨ λ‚΄μš©μ„ 읽을 수 μžˆμŠ΅λ‹ˆλ‹€.
101102
#
102-
# The model is as follows: let our input sentence be
103-
# :math:`w_1, \dots, w_M`, where :math:`w_i \in V`, our vocab. Also, let
104-
# :math:`T` be our tag set, and :math:`y_i` the tag of word :math:`w_i`.
105-
# Denote our prediction of the tag of word :math:`w_i` by
106-
# :math:`\hat{y}_i`.
103+
# λͺ¨λΈμ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€. 단어가 :math:`w_i \in V` 일 λ•Œ,
104+
# μž…λ ₯ λ¬Έμž₯을 :math:`w_1, \dots, w_M` 라고 ν•©μ‹œλ‹€. λ˜ν•œ,
105+
# :math:`T` λ₯Ό 우리의 νƒœκ·Έ 집합라고 ν•˜κ³ , :math:`w_i` 의 단어 νƒœκ·Έλ₯Ό :math:`y_i` 라고 ν•©λ‹ˆλ‹€.
106+
# 단어 :math:`w_i` 에 λŒ€ν•œ 예츑된 νƒœκ·Έλ₯Ό :math:`\hat{y}_i` 둜 ν‘œμ‹œν•©λ‹ˆλ‹€.
107+
#
107108
#
108-
# This is a structure prediction, model, where our output is a sequence
109-
# :math:`\hat{y}_1, \dots, \hat{y}_M`, where :math:`\hat{y}_i \in T`.
109+
# 이것은 :math:`\hat{y}_i \in T` 일 λ•Œ, 좜λ ₯이 :math:`\hat{y}_1, \dots, \hat{y}_M` μ‹œν€€μŠ€μΈ
110+
# ꡬ쑰 예츑 λͺ¨λΈμž…λ‹ˆλ‹€.
110111
#
111-
# To do the prediction, pass an LSTM over the sentence. Denote the hidden
112-
# state at timestep :math:`i` as :math:`h_i`. Also, assign each tag a
113-
# unique index (like how we had word\_to\_ix in the word embeddings
114-
# section). Then our prediction rule for :math:`\hat{y}_i` is
112+
# μ˜ˆμΈ‘μ„ ν•˜κΈ° μœ„ν•΄, LSTM에 λ¬Έμž₯을 μ „λ‹¬ν•©λ‹ˆλ‹€. ν•œ μ‹œκ°„ 단계
113+
# :math:`i` 의 은닉 μƒνƒœλŠ” :math:`h_i` 둜 ν‘œμ‹œν•©λ‹ˆλ‹€. λ˜ν•œ 각 νƒœκ·Έμ—
114+
# κ³ μœ ν•œ 인덱슀λ₯Ό ν• λ‹Ήν•©λ‹ˆλ‹€ (단어 μž„λ² λ”© μ„Ήμ…˜μ—μ„œ word\_to\_ix λ₯Ό μ‚¬μš©ν•œ 것과 μœ μ‚¬ν•©λ‹ˆλ‹€.)
115+
# 그러면 :math:`\hat{y}_i` 예츑 κ·œμΉ™μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.
115116
#
116117
# .. math:: \hat{y}_i = \text{argmax}_j \ (\log \text{Softmax}(Ah_i + b))_j
117118
#
118-
# That is, take the log softmax of the affine map of the hidden state,
119-
# and the predicted tag is the tag that has the maximum value in this
120-
# vector. Note this implies immediately that the dimensionality of the
121-
# target space of :math:`A` is :math:`|T|`.
119+
# 즉, 은닉 μƒνƒœμ˜ μ•„ν•€ λ§΅(affine map)에 λŒ€ν•΄ 둜그 μ†Œν”„νŠΈλ§₯슀(log softmax)λ₯Ό μ·¨ν•˜κ³ ,
120+
# 예츑된 νƒœκ·ΈλŠ” 이 λ²‘ν„°μ—μ„œ κ°€μž₯ 큰 값을 κ°€μ§€λŠ” νƒœκ·Έκ°€ λ©λ‹ˆλ‹€.
121+
# 이것은 κ³§ :math:`A` 의 타깃 κ³΅κ°„μ˜ 차원이 :math:`|T|` λΌλŠ” 것을
122+
# μ˜λ―Έν•œλ‹€λŠ” 것을 μ•Œμ•„λ‘μ„Έμš”.
122123
#
123124
#
124-
# Prepare data:
125+
# 데이터 μ€€λΉ„:
125126

126127
def prepare_sequence(seq, to_ix):
127128
idxs = [to_ix[w] for w in seq]
128129
return torch.tensor(idxs, dtype=torch.long)
129130

130131

131132
training_data = [
132-
# Tags are: DET - determiner; NN - noun; V - verb
133-
# For example, the word "The" is a determiner
133+
# νƒœκ·ΈλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: DET - ν•œμ •μ‚¬;NN - λͺ…사;V - 동사
134+
# 예λ₯Ό λ“€μ–΄, "The" λΌλŠ” λ‹¨μ–΄λŠ” ν•œμ •μ‚¬μž…λ‹ˆλ‹€.
134135
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
135136
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
136137
]
137138
word_to_ix = {}
138-
# For each words-list (sentence) and tags-list in each tuple of training_data
139+
# training_data의 각 νŠœν”Œμ— μžˆλŠ” 각 단어 λͺ©λ‘(λ¬Έμž₯) 및 νƒœκ·Έ λͺ©λ‘μ— λŒ€ν•΄
139140
for sent, tags in training_data:
140141
for word in sent:
141-
if word not in word_to_ix: # word has not been assigned an index yet
142-
word_to_ix[word] = len(word_to_ix) # Assign each word with a unique index
142+
if word not in word_to_ix: # wordλŠ” 아직 λ²ˆν˜Έκ°€ ν• λ‹Ήλ˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€
143+
word_to_ix[word] = len(word_to_ix) # 각 단어에 κ³ μœ ν•œ 번호 ν• λ‹Ή
143144
print(word_to_ix)
144-
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # Assign each tag with a unique index
145+
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # 각 νƒœκ·Έμ— κ³ μœ ν•œ 번호 ν• λ‹Ή
145146

146-
# These will usually be more like 32 or 64 dimensional.
147-
# We will keep them small, so we can see how the weights change as we train.
147+
# 이것듀은 일반적으둜 32λ‚˜ 64차원에 κ°€κΉμŠ΅λ‹ˆλ‹€.
148+
# ν›ˆλ ¨ν•  λ•Œ κ°€μ€‘μΉ˜κ°€ μ–΄λ–»κ²Œ λ³€ν•˜λŠ”μ§€ 확인할 수 μžˆλ„λ‘, μž‘κ²Œ μœ μ§€ν•˜κ² μŠ΅λ‹ˆλ‹€.
148149
EMBEDDING_DIM = 6
149150
HIDDEN_DIM = 6
150151

151152
######################################################################
152-
# Create the model:
153+
# λͺ¨λΈ 생성:
153154

154155

155156
class LSTMTagger(nn.Module):
@@ -160,11 +161,11 @@ def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
160161

161162
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
162163

163-
# The LSTM takes word embeddings as inputs, and outputs hidden states
164-
# with dimensionality hidden_dim.
164+
# LSTM은 단어 μž„λ² λ”©μ„ μž…λ ₯으둜 λ°›κ³ ,
165+
# 차원이 hidden_dim인 은닉 μƒνƒœλ₯Ό 좜λ ₯ν•©λ‹ˆλ‹€.
165166
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
166167

167-
# The linear layer that maps from hidden state space to tag space
168+
# 은닉 μƒνƒœ κ³΅κ°„μ—μ„œ νƒœκ·Έ κ³΅κ°„μœΌλ‘œ λ§€ν•‘ν•˜λŠ” μ„ ν˜• λ ˆμ΄μ–΄
168169
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
169170

170171
def forward(self, sentence):
@@ -175,79 +176,79 @@ def forward(self, sentence):
175176
return tag_scores
176177

177178
######################################################################
178-
# Train the model:
179+
# λͺ¨λΈ ν•™μŠ΅:
179180

180181

181182
model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
182183
loss_function = nn.NLLLoss()
183184
optimizer = optim.SGD(model.parameters(), lr=0.1)
184185

185-
# See what the scores are before training
186-
# Note that element i,j of the output is the score for tag j for word i.
187-
# Here we don't need to train, so the code is wrapped in torch.no_grad()
186+
# ν›ˆλ ¨ μ „μ˜ 점수λ₯Ό ν™•μΈν•˜μ„Έμš”.
187+
# 좜λ ₯의 i,jμš”μ†ŒλŠ” 단어 i에 λŒ€ν•œ νƒœκ·Έ j의 μ μˆ˜μž…λ‹ˆλ‹€.
188+
# μ—¬κΈ°μ„œλŠ” ν›ˆλ ¨μ„ ν•  ν•„μš”κ°€ μ—†μœΌλ―€λ‘œ, μ½”λ“œλŠ” torch.no_grad()둜 λž˜ν•‘ λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
188189
with torch.no_grad():
189190
inputs = prepare_sequence(training_data[0][0], word_to_ix)
190191
tag_scores = model(inputs)
191192
print(tag_scores)
192193

193-
for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
194+
for epoch in range(300): # λ‹€μ‹œ λ§ν•˜μ§€λ§Œ, 일반적으둜 300에폭을 μˆ˜ν–‰ν•˜μ§€λŠ” μ•ŠμŠ΅λ‹ˆλ‹€. 이건 μž₯λ‚œκ° 데이터이기 λ•Œλ¬Έμž…λ‹ˆλ‹€.
194195
for sentence, tags in training_data:
195-
# Step 1. Remember that Pytorch accumulates gradients.
196-
# We need to clear them out before each instance
196+
# 1단계, PytorchλŠ” 변화도λ₯Ό μΆ•μ ν•œλ‹€λŠ” 것을 κΈ°μ–΅ν•˜μ„Έμš”.
197+
# 각 μΈμŠ€ν„΄μŠ€ 전에 이λ₯Ό μ§€μ›Œμ€˜μ•Ό ν•©λ‹ˆλ‹€.
197198
model.zero_grad()
198199

199-
# Step 2. Get our inputs ready for the network, that is, turn them into
200-
# Tensors of word indices.
200+
# 2단계, λ„€νŠΈμ›Œν¬μ— 맞게 μž…λ ₯을 μ€€λΉ„μ‹œν‚΅λ‹ˆλ‹€.
201+
# 즉, μž…λ ₯듀을 단어 μΈλ±μŠ€λ“€μ˜ ν…μ„œλ‘œ λ³€ν™˜ν•©λ‹ˆλ‹€.
201202
sentence_in = prepare_sequence(sentence, word_to_ix)
202203
targets = prepare_sequence(tags, tag_to_ix)
203204

204-
# Step 3. Run our forward pass.
205+
# 3단계, μˆœμ „νŒŒ 단계(forward pass)λ₯Ό μ‹€ν–‰ν•©λ‹ˆλ‹€.
205206
tag_scores = model(sentence_in)
206207

207-
# Step 4. Compute the loss, gradients, and update the parameters by
208-
# calling optimizer.step()
208+
# 4단계, 손싀과 기울기λ₯Ό κ³„μ‚°ν•˜κ³ , optimizer.step()을 ν˜ΈμΆœν•˜μ—¬
209+
# λ§€κ°œλ³€μˆ˜λ₯Ό μ—…λ°μ΄νŠΈν•©λ‹ˆλ‹€.
209210
loss = loss_function(tag_scores, targets)
210211
loss.backward()
211212
optimizer.step()
212213

213-
# See what the scores are after training
214+
# ν›ˆλ ¨ ν›„μ˜ 점수λ₯Ό 확인해 λ³΄μ„Έμš”.
214215
with torch.no_grad():
215216
inputs = prepare_sequence(training_data[0][0], word_to_ix)
216217
tag_scores = model(inputs)
217218

218-
# The sentence is "the dog ate the apple". i,j corresponds to score for tag j
219-
# for word i. The predicted tag is the maximum scoring tag.
220-
# Here, we can see the predicted sequence below is 0 1 2 0 1
221-
# since 0 is index of the maximum value of row 1,
222-
# 1 is the index of maximum value of row 2, etc.
223-
# Which is DET NOUN VERB DET NOUN, the correct sequence!
219+
# λ¬Έμž₯은 "the dog ate the apple"μž…λ‹ˆλ‹€. i와 jλŠ” 단어 i에 λŒ€ν•œ νƒœκ·Έ j의 점수λ₯Ό μ˜λ―Έν•©λ‹ˆλ‹€.
220+
# 예츑된 νƒœκ·ΈλŠ” κ°€μž₯ μ μˆ˜κ°€ 높은 νƒœκ·Έμž…λ‹ˆλ‹€.
221+
# 자, μ•„λž˜μ˜ 예츑된 μˆœμ„œκ°€ 0 1 2 0 1μ΄λΌλŠ” 것을 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
222+
# 0은 1행에 λŒ€ν•œ μ΅œλŒ“κ°’μ΄λ―€λ‘œ,
223+
# 1은 2행에 λŒ€ν•œ μ΅œλŒ“κ°’μ΄ λ˜λŠ” μ‹μž…λ‹ˆλ‹€.
224+
# DET NOUN VERB DET NOUN은 μ˜¬λ°”λ₯Έ μˆœμ„œμž…λ‹ˆλ‹€!
224225
print(tag_scores)
225226

226227

227228
######################################################################
228-
# Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
229+
# μ—°μŠ΅ : 문자-λ‹¨μœ„ νŠΉμ§•κ³Ό LSTM ν’ˆμ‚¬ νƒœκ±° 증강
229230
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230231
#
231-
# In the example above, each word had an embedding, which served as the
232-
# inputs to our sequence model. Let's augment the word embeddings with a
233-
# representation derived from the characters of the word. We expect that
234-
# this should help significantly, since character-level information like
235-
# affixes have a large bearing on part-of-speech. For example, words with
236-
# the affix *-ly* are almost always tagged as adverbs in English.
232+
# μœ„μ˜ μ˜ˆμ œμ—μ„œ, 각 λ‹¨μ–΄λŠ” μ‹œν€€μŠ€ λͺ¨λΈμ— μž…λ ₯ 역할을 ν•˜λŠ” μž„λ² λ”©μ„ κ°€μ§‘λ‹ˆλ‹€.
233+
# λ‹¨μ–΄μ˜ λ¬Έμžμ—μ„œ νŒŒμƒλœ ν‘œν˜„μœΌλ‘œ 단어 μž„λ² λ”©μ„ μ¦κ°€μ‹œμΌœλ³΄κ² μŠ΅λ‹ˆλ‹€.
234+
# 접사(affixes)와 같은 문자 μˆ˜μ€€μ˜ μ •λ³΄λŠ” ν’ˆμ‚¬μ— 큰 영ν–₯을 미치기 λ•Œλ¬Έμ—,
235+
# μƒλ‹Ήν•œ 도움이 될 κ²ƒμœΌλ‘œ μ˜ˆμƒν•©λ‹ˆλ‹€.
236+
# 예λ₯Ό λ“€μ–΄, 접사 *-ly* κ°€ μžˆλŠ” λ‹¨μ–΄λŠ”
237+
# μ˜μ–΄μ—μ„œ 거의 항상 λΆ€μ‚¬λ‘œ νƒœκ·Έκ°€ μ§€μ •λ©λ‹ˆλ‹€.
237238
#
238-
# To do this, let :math:`c_w` be the character-level representation of
239-
# word :math:`w`. Let :math:`x_w` be the word embedding as before. Then
240-
# the input to our sequence model is the concatenation of :math:`x_w` and
241-
# :math:`c_w`. So if :math:`x_w` has dimension 5, and :math:`c_w`
242-
# dimension 3, then our LSTM should accept an input of dimension 8.
239+
# 이것을 ν•˜κΈ° μœ„ν•΄μ„œ, :math:`c_w` λ₯Ό 단어 :math:`w` 의 Cλ₯Ό 단어 w의 문자 μˆ˜μ€€ ν‘œν˜„μ΄λΌκ³  ν•˜κ³ ,
240+
# μ „κ³Ό 같이 :math:`x_w` λ₯Ό λ‹¨μ–΄μž„λ² λ”©μ΄λΌκ³  ν•©μ‹œλ‹€.
241+
# κ·Έλ ‡λ‹€λ©΄ 우리의 μ‹œν€€μŠ€ λͺ¨λΈμ— λŒ€ν•œ μž…λ ₯은 :math:`x_w` 와
242+
# :math:`c_w` 의 연결이라고 ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ§Œμ•½μ— :math:`x_w` κ°€ 차원 5λ₯Ό κ°€μ§€κ³ , :math:`c_w`
243+
# 차원 3을 κ°€μ§€λ©΄ LSTM은 차원 8의 μž…λ ₯을 λ°›μ•„λ“€μ—¬μ•Ό ν•©λ‹ˆλ‹€.
243244
#
244-
# To get the character level representation, do an LSTM over the
245-
# characters of a word, and let :math:`c_w` be the final hidden state of
246-
# this LSTM. Hints:
245+
# 문자 μˆ˜μ€€μ˜ ν‘œν˜„μ„ μ–»κΈ° μœ„ν•΄μ„œ, λ‹¨μ–΄μ˜ λ¬Έμžμ— λŒ€ν•΄μ„œ LSTM을 μˆ˜ν–‰ν•˜κ³ 
246+
# :math:`c_w` λ₯Ό LSTM의 μ΅œμ’… 은닉 μƒνƒœκ°€ λ˜λ„λ‘ ν•©λ‹ˆλ‹€.
247+
# 힌트:
247248
#
248-
# * There are going to be two LSTM's in your new model.
249-
# The original one that outputs POS tag scores, and the new one that
250-
# outputs a character-level representation of each word.
251-
# * To do a sequence model over characters, you will have to embed characters.
252-
# The character embeddings will be the input to the character LSTM.
249+
# * μƒˆ λͺ¨λΈμ—λŠ” 두 개의 LSTM이 μžˆμ„ κ²ƒμž…λ‹ˆλ‹€.
250+
# POS νƒœκ·Έ 점수λ₯Ό 좜λ ₯ν•˜λŠ” μ›λž˜μ˜ LSTMκ³Ό
251+
# 각 λ‹¨μ–΄μ˜ 문자 μˆ˜μ€€ ν‘œν˜„μ„ 좜λ ₯ν•˜λŠ” μƒˆλ‘œμš΄ LSTMμž…λ‹ˆλ‹€.
252+
# * λ¬Έμžμ— λŒ€ν•΄ μ‹œν€€μŠ€ λͺ¨λΈμ„ μˆ˜ν–‰ν•˜λ €λ©΄, 문자λ₯Ό μž„λ² λ”©ν•΄μ•Ό ν•©λ‹ˆλ‹€.
253+
# 문자 μž„λ² λ”©μ€ 문자 LSTM에 λŒ€ν•œ μž…λ ₯이 λ©λ‹ˆλ‹€.
253254
#

0 commit comments

Comments
Β (0)