11# -*- coding: utf-8 -*-
22r"""
3- Sequence Models and Long Short-Term Memory Networks
3+ μνμ€ λͺ¨λΈκ³Ό LSTM λ€νΈμν¬
44===================================================
5+ **λ²μ**: `λ°μλ―Ό <https://github.com/convin305>`_
56
6- At this point, we have seen various feed-forward networks. That is,
7- there is no state maintained by the network at all. This might not be
8- the behavior we want. Sequence models are central to NLP: they are
9- models where there is some sort of dependence through time between your
10- inputs. The classical example of a sequence model is the Hidden Markov
11- Model for part-of-speech tagging. Another example is the conditional
12- random field.
13-
14- A recurrent neural network is a network that maintains some kind of
15- state. For example, its output could be used as part of the next input,
16- so that information can propagate along as the network passes over the
17- sequence. In the case of an LSTM, for each element in the sequence,
18- there is a corresponding *hidden state* :math:`h_t`, which in principle
19- can contain information from arbitrary points earlier in the sequence.
20- We can use the hidden state to predict words in a language model,
21- part-of-speech tags, and a myriad of other things.
22-
23-
24- LSTMs in Pytorch
7+ μ§κΈκΉμ§ μ°λ¦¬λ λ€μν μμ ν(feed-forward) μ κ²½λ§λ€μ 보μ μμ΅λλ€.
8+ μ¦, λ€νΈμν¬μ μν΄ μ μ§λλ μνκ° μ ν μλ€λ κ²μ
λλ€.
9+ μ΄κ²μ μλ§ μ°λ¦¬κ° μνλ λμμ΄ μλ μλ μμ΅λλ€.
10+ μνμ€ λͺ¨λΈμ NLPμ ν΅μ¬μ
λλ€. μ΄λ μ
λ ₯ κ°μ μΌμ’
μ μκ°μ μ’
μμ±μ΄ μ‘΄μ¬νλ λͺ¨λΈμ λ§ν©λλ€.
11+ μνμ€ λͺ¨λΈμ κ³ μ μ μΈ μλ νμ¬ νκΉ
μ μν νλ λ§λ₯΄μ½ν λͺ¨λΈμ
λλ€.
12+ λ λ€λ₯Έ μλ μ‘°κ±΄λΆ λλ€ νλμ
λλ€.
13+
14+ μν μ κ²½λ§μ μΌμ’
μ μνλ₯Ό μ μ§νλ λ€νΈμν¬μ
λλ€.
15+ μλ₯Ό λ€λ©΄, μΆλ ₯μ λ€μ μ
λ ₯μ μΌλΆλ‘ μ¬μ©λ μ μμ΅λλ€.
16+ μ 보λ λ€νΈμν¬κ° μνμ€λ₯Ό ν΅κ³Όν λ μ νλ μ μμ΅λλ€.
17+ LSTMμ κ²½μ°μ, μνμ€μ κ° μμμ λμνλ *μλ μν(hidden state)* :math:`h_t` κ° μ‘΄μ¬νλ©°,
18+ μ΄λ μμΉμ μΌλ‘ μνμ€μ μλΆλΆμ μλ μμ ν¬μΈνΈμ μ 보λ₯Ό ν¬ν¨ν μ μμ΅λλ€.
19+ μ°λ¦¬λ μλ μνλ₯Ό μ΄μ©νμ¬ μΈμ΄ λͺ¨λΈμμμ λ¨μ΄,
20+ νμ¬ νκ·Έ λ± λ¬΄μν λ§μ κ²λ€μ μμΈ‘ν μ μμ΅λλ€.
21+
22+
23+ Pytorchμμμ LSTM
2524~~~~~~~~~~~~~~~~~
2625
27- Before getting to the example, note a few things. Pytorch's LSTM expects
28- all of its inputs to be 3D tensors. The semantics of the axes of these
29- tensors is important. The first axis is the sequence itself, the second
30- indexes instances in the mini-batch, and the third indexes elements of
31- the input. We haven't discussed mini-batching, so let's just ignore that
32- and assume we will always have just 1 dimension on the second axis. If
33- we want to run the sequence model over the sentence "The cow jumped",
34- our input should look like
26+ μμ λ₯Ό μμνκΈ° μ μ, λͺ κ°μ§ μ¬νμ μ μνμΈμ.
27+ Pytorchμμμ LSTMμ λͺ¨λ μ
λ ₯μ΄ 3D Tensor μΌ κ²μΌλ‘ μμν©λλ€.
28+ μ΄λ¬ν ν
μ μΆμ μλ―Έλ μ€μν©λλ€.
29+ 첫 λ²μ§Έ μΆμ μνμ€ μ체μ΄κ³ , λ λ²μ§Έ μΆμ λ―Έλ λ°°μΉμ μΈμ€ν΄μ€λ₯Ό μΈλ±μ±νλ©°,
30+ μΈ λ²μ§Έ μΆμ μ
λ ₯ μμλ₯Ό μΈλ±μ±ν©λλ€.
31+ λ―Έλ λ°°μΉμ λν΄μλ λ
Όμνμ§ μμμΌλ―λ‘ μ΄λ₯Ό 무μνκ³ ,
32+ λ λ²μ§Έ μΆμ λν΄μλ νμ 1μ°¨μλ§ κ°μ§ κ²μ΄λΌκ³ κ°μ νκ² μ΅λλ€.
33+ λ§μ½ μ°λ¦¬κ° "The cow jumped."λΌλ λ¬Έμ₯μ λν΄ μνμ€ λͺ¨λΈμ μ€ννλ €λ©΄,
34+ μ
λ ₯μ λ€μκ³Ό κ°μμΌ ν©λλ€.
3535
3636.. math::
3737
4242 q_\text{jumped}
4343 \end{bmatrix}
4444
45- Except remember there is an additional 2nd dimension with size 1.
45+ λ€λ§, μ¬μ΄μ¦κ° 1μΈ μΆκ°μ μΈ 2μ°¨μμ΄ μλ€λ κ²μ κΈ°μ΅ν΄μΌ ν©λλ€.
4646
47- In addition, you could go through the sequence one at a time, in which
48- case the 1st axis will have size 1 also.
47+ λν ν λ²μ νλμ© μνμ€λ₯Ό μ§νν μ μμΌλ©°,
48+ μ΄ κ²½μ° μ²« λ²μ§Έ μΆλ μ¬μ΄μ¦κ° 1μ΄ λ©λλ€.
4949
50- Let's see a quick example.
50+ κ°λ¨ν μλ₯Ό μ΄ν΄λ³΄κ² μ΅λλ€.
5151"""
5252
5353# Author: Robert Guthrie
6161
6262######################################################################
6363
64- lstm = nn .LSTM (3 , 3 ) # Input dim is 3, output dim is 3
65- inputs = [torch .randn (1 , 3 ) for _ in range (5 )] # make a sequence of length 5
64+ lstm = nn .LSTM (3 , 3 ) # μ
λ ₯ 3μ°¨μ, μΆλ ₯ 3μ°¨μ
65+ inputs = [torch .randn (1 , 3 ) for _ in range (5 )] # κΈΈμ΄κ° 5μΈ μνμ€λ₯Ό λ§λλλ€
6666
67- # initialize the hidden state .
67+ # μλ μνλ₯Ό μ΄κΈ°νν©λλ€ .
6868hidden = (torch .randn (1 , 1 , 3 ),
6969 torch .randn (1 , 1 , 3 ))
7070for i in inputs :
71- # Step through the sequence one element at a time .
72- # after each step, hidden contains the hidden state .
71+ # ν λ²μ ν μμμ© μνμ€λ₯Ό ν΅κ³Όν©λλ€ .
72+ # κ° λ¨κ³κ° λλλ©΄, hiddenμλ μλ μνκ° ν¬ν¨λ©λλ€ .
7373 out , hidden = lstm (i .view (1 , 1 , - 1 ), hidden )
7474
75- # alternatively, we can do the entire sequence all at once.
76- # the first value returned by LSTM is all of the hidden states throughout
77- # the sequence. the second is just the most recent hidden state
78- # (compare the last slice of "out" with "hidden" below, they are the same )
79- # The reason for this is that :
80- # "out" will give you access to all hidden states in the sequence
81- # "hidden" will allow you to continue the sequence and backpropagate,
82- # by passing it as an argument to the lstm at a later time
83- # Add the extra 2nd dimension
75+ # μλλ©΄ μ°λ¦¬λ μ 체 μνμ€λ₯Ό ν λ²μ μνν μλ μμ΅λλ€.
76+ # LSTMμ μν΄ λ°νλ 첫 λ²μ§Έ κ°μ μνμ€ μ 체μ λν μλ μνμ
λλ€.
77+ # λ λ²μ§Έλ κ°μ₯ μ΅κ·Όμ μλ μνμ
λλ€.
78+ # (μλμ "hidden"κ³Ό "out"μ λ§μ§λ§ μ¬λΌμ΄μ€(slice)λ₯Ό λΉκ΅ν΄ 보면 λμ λμΌν©λλ€. )
79+ # μ΄λ κ² νλ μ΄μ λ λ€μκ³Ό κ°μ΅λλ€ :
80+ # "out"μ μνμ€μ λͺ¨λ μλ μνμ λν μ‘μΈμ€λ₯Ό μ 곡νκ³ ,
81+ # "hidden"μ λμ€μ lstmμ μΈμ ννλ‘ μ λ¬νμ¬
82+ # μνμ€λ₯Ό κ³μνκ³ , μμ ν νλλ‘ ν©λλ€.
83+ # μΆκ°λ‘ λ λ²μ§Έ μ°¨μμ λν©λλ€.
8484inputs = torch .cat (inputs ).view (len (inputs ), 1 , - 1 )
85- hidden = (torch .randn (1 , 1 , 3 ), torch .randn (1 , 1 , 3 )) # clean out hidden state
85+ hidden = (torch .randn (1 , 1 , 3 ), torch .randn (1 , 1 , 3 )) # μλ μνλ₯Ό μ§μλλ€.
8686out , hidden = lstm (inputs , hidden )
8787print (out )
8888print (hidden )
8989
9090
9191######################################################################
92- # Example: An LSTM for Part-of-Speech Tagging
92+ # μμ: νμ¬ νκΉ
μ μν LSTM
9393# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9494#
95- # In this section, we will use an LSTM to get part of speech tags. We will
96- # not use Viterbi or Forward-Backward or anything like that, but as a
97- # (challenging) exercise to the reader, think about how Viterbi could be
98- # used after you have seen what is going on. In this example, we also refer
99- # to embeddings. If you are unfamiliar with embeddings, you can read up
100- # about them `here <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
95+ # μ΄ μΉμ
μμλ μ°λ¦¬λ νμ¬ νκ·Έλ₯Ό μ»κΈ° μν΄ LSTMμ μ΄μ©ν κ²μ
λλ€.
96+ # λΉν°λΉ(Viterbi)λ μλ°©ν₯-μλ°©ν₯(Forward-Backward) κ°μ κ²λ€μ μ¬μ©νμ§ μμ κ²μ
λλ€.
97+ # κ·Έλ¬λ (λμ μ μΈ) μ°μ΅μΌλ‘, μ΄λ»κ² λμκ°λμ§λ₯Ό νμΈν λ€μ
98+ # λΉν°λΉλ₯Ό μ΄λ»κ² μ¬μ©ν μ μλμ§μ λν΄μ μκ°ν΄ 보μκΈ° λ°λλλ€.
99+ # μ΄ μμμμλ μλ² λ©λ μ°Έμ‘°ν©λλ€. λ§μ½μ μλ² λ©μ μ΅μνμ§ μλ€λ©΄,
100+ # `μ¬κΈ° <https://tutorials.pytorch.kr/beginner/nlp/word_embeddings_tutorial.html>`__.
101+ # μμ κ΄λ ¨ λ΄μ©μ μ½μ μ μμ΅λλ€.
101102#
102- # The model is as follows: let our input sentence be
103- # :math:`w_1, \dots, w_M`, where :math:`w_i \in V`, our vocab. Also, let
104- # :math:`T` be our tag set, and :math:`y_i` the tag of word :math:`w_i`.
105- # Denote our prediction of the tag of word :math:`w_i` by
106- # :math:`\hat{y}_i`.
103+ # λͺ¨λΈμ λ€μκ³Ό κ°μ΅λλ€. λ¨μ΄κ° :math:`w_i \in V` μΌ λ,
104+ # μ
λ ₯ λ¬Έμ₯μ :math:`w_1, \dots, w_M` λΌκ³ ν©μλ€. λν,
105+ # :math:`T` λ₯Ό μ°λ¦¬μ νκ·Έ μ§ν©λΌκ³ νκ³ , :math:`w_i` μ λ¨μ΄ νκ·Έλ₯Ό :math:`y_i` λΌκ³ ν©λλ€.
106+ # λ¨μ΄ :math:`w_i` μ λν μμΈ‘λ νκ·Έλ₯Ό :math:`\hat{y}_i` λ‘ νμν©λλ€.
107+ #
107108#
108- # This is a structure prediction, model, where our output is a sequence
109- # :math:`\hat{y}_1, \dots, \hat{y}_M`, where :math:`\hat{y}_i \in T`.
109+ # μ΄κ²μ :math:`\hat{y}_i \in T` μΌ λ, μΆλ ₯μ΄ :math:`\hat{y}_1, \dots, \hat{y}_M` μνμ€μΈ
110+ # ꡬ쑰 μμΈ‘ λͺ¨λΈμ
λλ€.
110111#
111- # To do the prediction, pass an LSTM over the sentence. Denote the hidden
112- # state at timestep :math:`i` as :math:`h_i`. Also, assign each tag a
113- # unique index (like how we had word\_to\_ix in the word embeddings
114- # section). Then our prediction rule for :math:`\hat{y}_i` is
112+ # μμΈ‘μ νκΈ° μν΄, LSTMμ λ¬Έμ₯μ μ λ¬ν©λλ€. ν μκ° λ¨κ³
113+ # :math:`i` μ μλ μνλ :math:`h_i` λ‘ νμν©λλ€. λν κ° νκ·Έμ
114+ # κ³ μ ν μΈλ±μ€λ₯Ό ν λΉν©λλ€ (λ¨μ΄ μλ² λ© μΉμ
μμ word\_to\_ix λ₯Ό μ¬μ©ν κ²κ³Ό μ μ¬ν©λλ€.)
115+ # κ·Έλ¬λ©΄ :math:`\hat{y}_i` μμΈ‘ κ·μΉμ λ€μκ³Ό κ°μ΅λλ€.
115116#
116117# .. math:: \hat{y}_i = \text{argmax}_j \ (\log \text{Softmax}(Ah_i + b))_j
117118#
118- # That is, take the log softmax of the affine map of the hidden state ,
119- # and the predicted tag is the tag that has the maximum value in this
120- # vector. Note this implies immediately that the dimensionality of the
121- # target space of :math:`A` is :math:`|T|` .
119+ # μ¦, μλ μνμ μν λ§΅( affine map)μ λν΄ λ‘κ·Έ μννΈλ§₯μ€(log softmax)λ₯Ό μ·¨νκ³ ,
120+ # μμΈ‘λ νκ·Έλ μ΄ λ²‘ν°μμ κ°μ₯ ν° κ°μ κ°μ§λ νκ·Έκ° λ©λλ€.
121+ # μ΄κ²μ κ³§ :math:`A` μ νκΉ κ³΅κ°μ μ°¨μμ΄ :math:`|T|` λΌλ κ²μ
122+ # μλ―Ένλ€λ κ²μ μμλμΈμ .
122123#
123124#
124- # Prepare data :
125+ # λ°μ΄ν° μ€λΉ :
125126
126127def prepare_sequence (seq , to_ix ):
127128 idxs = [to_ix [w ] for w in seq ]
128129 return torch .tensor (idxs , dtype = torch .long )
129130
130131
131132training_data = [
132- # Tags are : DET - determiner; NN - noun; V - verb
133- # For example, the word "The" is a determiner
133+ # νκ·Έλ λ€μκ³Ό κ°μ΅λλ€ : DET - νμ μ¬; NN - λͺ
μ¬; V - λμ¬
134+ # μλ₯Ό λ€μ΄, "The" λΌλ λ¨μ΄λ νμ μ¬μ
λλ€.
134135 ("The dog ate the apple" .split (), ["DET" , "NN" , "V" , "DET" , "NN" ]),
135136 ("Everybody read that book" .split (), ["NN" , "V" , "DET" , "NN" ])
136137]
137138word_to_ix = {}
138- # For each words-list (sentence) and tags-list in each tuple of training_data
139+ # training_dataμ κ° ννμ μλ κ° λ¨μ΄ λͺ©λ‘(λ¬Έμ₯) λ° νκ·Έ λͺ©λ‘μ λν΄
139140for sent , tags in training_data :
140141 for word in sent :
141- if word not in word_to_ix : # word has not been assigned an index yet
142- word_to_ix [word ] = len (word_to_ix ) # Assign each word with a unique index
142+ if word not in word_to_ix : # wordλ μμ§ λ²νΈκ° ν λΉλμ§ μμμ΅λλ€
143+ word_to_ix [word ] = len (word_to_ix ) # κ° λ¨μ΄μ κ³ μ ν λ²νΈ ν λΉ
143144print (word_to_ix )
144- tag_to_ix = {"DET" : 0 , "NN" : 1 , "V" : 2 } # Assign each tag with a unique index
145+ tag_to_ix = {"DET" : 0 , "NN" : 1 , "V" : 2 } # κ° νκ·Έμ κ³ μ ν λ²νΈ ν λΉ
145146
146- # These will usually be more like 32 or 64 dimensional.
147- # We will keep them small, so we can see how the weights change as we train.
147+ # μ΄κ²λ€μ μΌλ°μ μΌλ‘ 32λ 64μ°¨μμ κ°κΉμ΅λλ€.
148+ # νλ ¨ν λ κ°μ€μΉκ° μ΄λ»κ² λ³νλμ§ νμΈν μ μλλ‘, μκ² μ μ§νκ² μ΅λλ€.
148149EMBEDDING_DIM = 6
149150HIDDEN_DIM = 6
150151
151152######################################################################
152- # Create the model :
153+ # λͺ¨λΈ μμ± :
153154
154155
155156class LSTMTagger (nn .Module ):
@@ -160,11 +161,11 @@ def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
160161
161162 self .word_embeddings = nn .Embedding (vocab_size , embedding_dim )
162163
163- # The LSTM takes word embeddings as inputs, and outputs hidden states
164- # with dimensionality hidden_dim.
164+ # LSTMμ λ¨μ΄ μλ² λ©μ μ
λ ₯μΌλ‘ λ°κ³ ,
165+ # μ°¨μμ΄ hidden_dimμΈ μλ μνλ₯Ό μΆλ ₯ν©λλ€.
165166 self .lstm = nn .LSTM (embedding_dim , hidden_dim )
166167
167- # The linear layer that maps from hidden state space to tag space
168+ # μλ μν 곡κ°μμ νκ·Έ 곡κ°μΌλ‘ λ§€ννλ μ ν λ μ΄μ΄
168169 self .hidden2tag = nn .Linear (hidden_dim , tagset_size )
169170
170171 def forward (self , sentence ):
@@ -175,79 +176,79 @@ def forward(self, sentence):
175176 return tag_scores
176177
177178######################################################################
178- # Train the model :
179+ # λͺ¨λΈ νμ΅ :
179180
180181
181182model = LSTMTagger (EMBEDDING_DIM , HIDDEN_DIM , len (word_to_ix ), len (tag_to_ix ))
182183loss_function = nn .NLLLoss ()
183184optimizer = optim .SGD (model .parameters (), lr = 0.1 )
184185
185- # See what the scores are before training
186- # Note that element i,j of the output is the score for tag j for word i .
187- # Here we don't need to train, so the code is wrapped in torch.no_grad()
186+ # νλ ¨ μ μ μ μλ₯Ό νμΈνμΈμ.
187+ # μΆλ ₯μ i,jμμλ λ¨μ΄ iμ λν νκ·Έ jμ μ μμ
λλ€ .
188+ # μ¬κΈ°μλ νλ ¨μ ν νμκ° μμΌλ―λ‘, μ½λλ torch.no_grad()λ‘ λν λμ΄ μμ΅λλ€.
188189with torch .no_grad ():
189190 inputs = prepare_sequence (training_data [0 ][0 ], word_to_ix )
190191 tag_scores = model (inputs )
191192 print (tag_scores )
192193
193- for epoch in range (300 ): # again, normally you would NOT do 300 epochs, it is toy data
194+ for epoch in range (300 ): # λ€μ λ§νμ§λ§, μΌλ°μ μΌλ‘ 300μνμ μννμ§λ μμ΅λλ€. μ΄κ±΄ μ₯λκ° λ°μ΄ν°μ΄κΈ° λλ¬Έμ
λλ€.
194195 for sentence , tags in training_data :
195- # Step 1. Remember that Pytorch accumulates gradients.
196- # We need to clear them out before each instance
196+ # 1λ¨κ³, Pytorchλ λ³νλλ₯Ό μΆμ νλ€λ κ²μ κΈ°μ΅νμΈμ.
197+ # κ° μΈμ€ν΄μ€ μ μ μ΄λ₯Ό μ§μμ€μΌ ν©λλ€.
197198 model .zero_grad ()
198199
199- # Step 2. Get our inputs ready for the network, that is, turn them into
200- # Tensors of word indices.
200+ # 2λ¨κ³, λ€νΈμν¬μ λ§κ² μ
λ ₯μ μ€λΉμν΅λλ€.
201+ # μ¦, μ
λ ₯λ€μ λ¨μ΄ μΈλ±μ€λ€μ ν
μλ‘ λ³νν©λλ€.
201202 sentence_in = prepare_sequence (sentence , word_to_ix )
202203 targets = prepare_sequence (tags , tag_to_ix )
203204
204- # Step 3. Run our forward pass.
205+ # 3λ¨κ³, μμ ν λ¨κ³( forward pass)λ₯Ό μ€νν©λλ€ .
205206 tag_scores = model (sentence_in )
206207
207- # Step 4. Compute the loss, gradients, and update the parameters by
208- # calling optimizer.step()
208+ # 4λ¨κ³, μμ€κ³Ό κΈ°μΈκΈ°λ₯Ό κ³μ°νκ³ , optimizer.step()μ νΈμΆνμ¬
209+ # λ§€κ°λ³μλ₯Ό μ
λ°μ΄νΈν©λλ€.
209210 loss = loss_function (tag_scores , targets )
210211 loss .backward ()
211212 optimizer .step ()
212213
213- # See what the scores are after training
214+ # νλ ¨ νμ μ μλ₯Ό νμΈν΄ 보μΈμ.
214215with torch .no_grad ():
215216 inputs = prepare_sequence (training_data [0 ][0 ], word_to_ix )
216217 tag_scores = model (inputs )
217218
218- # The sentence is "the dog ate the apple". i,j corresponds to score for tag j
219- # for word i. The predicted tag is the maximum scoring tag .
220- # Here, we can see the predicted sequence below is 0 1 2 0 1
221- # since 0 is index of the maximum value of row 1,
222- # 1 is the index of maximum value of row 2, etc .
223- # Which is DET NOUN VERB DET NOUN, the correct sequence !
219+ # λ¬Έμ₯μ "the dog ate the apple"μ
λλ€. iμ jλ λ¨μ΄ iμ λν νκ·Έ jμ μ μλ₯Ό μλ―Έν©λλ€.
220+ # μμΈ‘λ νκ·Έλ κ°μ₯ μ μκ° λμ νκ·Έμ
λλ€ .
221+ # μ, μλμ μμΈ‘λ μμκ° 0 1 2 0 1μ΄λΌλ κ²μ νμΈν μ μμ΅λλ€.
222+ # 0μ 1νμ λν μ΅λκ°μ΄λ―λ‘,
223+ # 1μ 2νμ λν μ΅λκ°μ΄ λλ μμ
λλ€ .
224+ # DET NOUN VERB DET NOUNμ μ¬λ°λ₯Έ μμμ
λλ€ !
224225 print (tag_scores )
225226
226227
227228######################################################################
228- # Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
229+ # μ°μ΅ : λ¬Έμ-λ¨μ νΉμ§κ³Ό LSTM νμ¬ νκ±° μ¦κ°
229230# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230231#
231- # In the example above, each word had an embedding, which served as the
232- # inputs to our sequence model. Let's augment the word embeddings with a
233- # representation derived from the characters of the word. We expect that
234- # this should help significantly, since character-level information like
235- # affixes have a large bearing on part-of-speech. For example, words with
236- # the affix *-ly* are almost always tagged as adverbs in English .
232+ # μμ μμ μμ, κ° λ¨μ΄λ μνμ€ λͺ¨λΈμ μ
λ ₯ μν μ νλ μλ² λ©μ κ°μ§λλ€.
233+ # λ¨μ΄μ λ¬Έμμμ νμλ ννμΌλ‘ λ¨μ΄ μλ² λ©μ μ¦κ°μμΌλ³΄κ² μ΅λλ€.
234+ # μ μ¬(affixes)μ κ°μ λ¬Έμ μμ€μ μ 보λ νμ¬μ ν° μν₯μ λ―ΈμΉκΈ° λλ¬Έμ,
235+ # μλΉν λμμ΄ λ κ²μΌλ‘ μμν©λλ€.
236+ # μλ₯Ό λ€μ΄, μ μ¬ *-ly* κ° μλ λ¨μ΄λ
237+ # μμ΄μμ κ±°μ νμ λΆμ¬λ‘ νκ·Έκ° μ§μ λ©λλ€ .
237238#
238- # To do this, let :math:`c_w` be the character-level representation of
239- # word :math:`w`. Let :math:`x_w` be the word embedding as before. Then
240- # the input to our sequence model is the concatenation of :math:`x_w` and
241- # :math:`c_w`. So if :math:`x_w` has dimension 5, and :math:`c_w`
242- # dimension 3, then our LSTM should accept an input of dimension 8.
239+ # μ΄κ²μ νκΈ° μν΄μ, :math:`c_w` λ₯Ό λ¨μ΄ :math:`w` μ Cλ₯Ό λ¨μ΄ wμ λ¬Έμ μμ€ ννμ΄λΌκ³ νκ³ ,
240+ # μ κ³Ό κ°μ΄ :math:`x_w` λ₯Ό λ¨μ΄μλ² λ©μ΄λΌκ³ ν©μλ€.
241+ # κ·Έλ λ€λ©΄ μ°λ¦¬μ μνμ€ λͺ¨λΈμ λν μ
λ ₯μ :math:`x_w` μ
242+ # :math:`c_w` μ μ°κ²°μ΄λΌκ³ ν μ μμ΅λλ€. λ§μ½μ :math:`x_w` κ° μ°¨μ 5λ₯Ό κ°μ§κ³ , :math:`c_w`
243+ # μ°¨μ 3μ κ°μ§λ©΄ LSTMμ μ°¨μ 8μ μ
λ ₯μ λ°μλ€μ¬μΌ ν©λλ€.
243244#
244- # To get the character level representation, do an LSTM over the
245- # characters of a word, and let :math:`c_w` be the final hidden state of
246- # this LSTM. Hints :
245+ # λ¬Έμ μμ€μ ννμ μ»κΈ° μν΄μ, λ¨μ΄μ λ¬Έμμ λν΄μ LSTMμ μννκ³
246+ # :math:`c_w` λ₯Ό LSTMμ μ΅μ’
μλ μνκ° λλλ‘ ν©λλ€.
247+ # ννΈ :
247248#
248- # * There are going to be two LSTM's in your new model.
249- # The original one that outputs POS tag scores, and the new one that
250- # outputs a character-level representation of each word.
251- # * To do a sequence model over characters, you will have to embed characters.
252- # The character embeddings will be the input to the character LSTM .
249+ # * μ λͺ¨λΈμλ λ κ°μ LSTMμ΄ μμ κ²μ
λλ€.
250+ # POS νκ·Έ μ μλ₯Ό μΆλ ₯νλ μλμ LSTMκ³Ό
251+ # κ° λ¨μ΄μ λ¬Έμ μμ€ ννμ μΆλ ₯νλ μλ‘μ΄ LSTMμ
λλ€.
252+ # * λ¬Έμμ λν΄ μνμ€ λͺ¨λΈμ μννλ €λ©΄, λ¬Έμλ₯Ό μλ² λ©ν΄μΌ ν©λλ€.
253+ # λ¬Έμ μλ² λ©μ λ¬Έμ LSTMμ λν μ
λ ₯μ΄ λ©λλ€ .
253254#
0 commit comments