Skip to content

Commit 400fda9

Browse files
authored
Korean translation for text_to_speech_with_torchaudio.py ๋ฒˆ์—ญ (#601)
* Korean translation for text_to_speech_with_torchaudio.py
1 parent 601a656 commit 400fda9

1 file changed

Lines changed: 74 additions & 97 deletions

File tree

โ€Žintermediate_source/text_to_speech_with_torchaudio.pyโ€Ž

Lines changed: 74 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,53 @@
11
"""
2-
Text-to-speech with torchaudio
3-
==============================
4-
5-
**Author**: `Yao-Yuan Yang <https://github.com/yangarbiter>`__, `Moto
6-
Hira <moto@fb.com>`__
2+
torchaudio๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ์—์„œ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜(text-to-speech)
3+
==============================================================
4+
**Author**: `Yao-Yuan Yang <https://github.com/yangarbiter>`__, `Moto Hira <moto@fb.com>`__
5+
**๋ฒˆ์—ญ์ž**: `์ด๊ฐ€๋žŒ <https://github.com/garam24>`__
76
87
"""
98

109
# %matplotlib inline
1110

1211

1312
######################################################################
14-
# Overview
13+
# ๊ฐœ์š”
1514
# --------
1615
#
17-
# This tutorial shows how to build text-to-speech pipeline, using the
18-
# pretrained Tacotron2 in torchaudio.
16+
# ์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” torchaudio์—์„œ ์‚ฌ์ „ํ•™์Šต๋œ Tacotron2๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…์ŠคํŠธ์—์„œ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”
17+
# ํŒŒ์ดํ”„๋ผ์ธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.
1918
#
20-
# The text-to-speech pipeline goes as follows: 1. Text preprocessing
19+
# ํ…์ŠคํŠธ์—์„œ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์€ ๋‹ค์Œ์˜ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค: 1. ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ
2120
#
22-
# First, the input text is encoded into a list of symbols. In this
23-
# tutorial, we will use English characters and phonemes as the symbols.
21+
# ๋จผ์ €, ์ž…๋ ฅ ํ…์ŠคํŠธ๋ฅผ ๊ธฐํ˜ธ ๋ฆฌ์ŠคํŠธ๋กœ ์ธ์ฝ”๋”ฉ(encoding)ํ•ฉ๋‹ˆ๋‹ค. ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์˜๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ณ 
22+
# ๊ธฐํ˜ธ๋กœ๋Š” ์Œ์†Œ(phonene)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.
2423
#
25-
# 2. Spectrogram generation
24+
# 2. ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ(spectrogram) ์ƒ์„ฑ
2625
#
27-
# From the encoded text, a spectrogram is generated. We use ``Tacotron2``
28-
# model for this.
26+
# ์ธ์ฝ”๋”ฉ๋œ ํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ``Tacotron2`` ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
2927
#
30-
# 3. Time-domain conversion
28+
# 3. ์‹œ๊ฐ„-๋„๋ฉ”์ธ(time-domain) ๋ณ€ํ™˜
3129
#
32-
# The last step is converting the spectrogram into the waveform. The
33-
# process to generate speech from spectrogram is also called Vocoder. In
34-
# this tutorial, three different vocoders are used,
30+
# ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„์—์„œ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ํŒŒํ˜•(waveform)์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
31+
# ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ์Œ์„ฑ์„ ์ƒ์„ฑํ•˜๋Š” ์ด ๊ณผ์ •์„ ๋ณด์ฝ”๋”(vocoder)๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
32+
# ์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ์„ธ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ๋ณด์ฝ”๋”๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
3533
# ```WaveRNN`` <https://pytorch.org/audio/stable/models/wavernn.html>`__,
3634
# ```Griffin-Lim`` <https://pytorch.org/audio/stable/transforms.html#griffinlim>`__,
3735
# and
3836
# ```Nvidia's WaveGlow`` <https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/>`__.
3937
#
40-
# The following figure illustrates the whole process.
38+
# ๋‹ค์Œ ๊ทธ๋ฆผ์€ ์ „์ฒด ๊ณผ์ •์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
4139
#
4240
# .. image:: https://download.pytorch.org/torchaudio/tutorial-assets/tacotron2_tts_pipeline.png
4341
#
4442

4543

4644
######################################################################
47-
# Preparation
45+
# ์ค€๋น„ ๋‹จ๊ณ„
4846
# -----------
4947
#
50-
# First, we install the necessary dependencies. In addition to
51-
# ``torchaudio``, ``DeepPhonemizer`` is required to perform phoneme-based
52-
# encoding.
48+
# ๋จผ์ €, ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ผ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. ์Œ์†Œ ๋‹จ์œ„ ์ธ์ฝ”๋”ฉ์„ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ``torchaudio`` ๋ฅผ ๋น„๋กฏํ•˜์—ฌ, ``DeepPhonemizer`` ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
5349
#
54-
55-
# When running this example in notebook, install DeepPhonemizer
50+
# ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ์—์„œ ์ด ์˜ˆ์ œ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ, DeepPhonemizer๋ฅผ ์„ค์น˜ํ•ด์ฃผ์„ธ์š”.
5651
# !pip3 install deep_phonemizer
5752

5853
import torch
@@ -70,29 +65,24 @@
7065

7166

7267
######################################################################
73-
# Text Processing
68+
# ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ
7469
# ---------------
7570
#
7671

7772

7873
######################################################################
79-
# Character-based encoding
74+
# ๋ฌธ์ž ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ
8075
# ~~~~~~~~~~~~~~~~~~~~~~~~
8176
#
82-
# In this section, we will go through how the character-based encoding
83-
# works.
77+
# ์ด๋ฒˆ ์„น์…˜์—์„œ๋Š” ๋ฌธ์ž ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ์ด ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์–ด์ง€๋Š”์ง€ ๋‹ค๋ฃฐ ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
8478
#
85-
# Since the pre-trained Tacotron2 model expects specific set of symbol
86-
# tables, the same functionalities available in ``torchaudio``. This
87-
# section is more for the explanation of the basis of encoding.
79+
# ์‚ฌ์ „ํ•™์Šต๋œ Tacotron2 ๋ชจ๋ธ์€ ๊ธฐํ˜ธ ํ…Œ์ด๋ธ”๋“ค์˜ ์ง‘ํ•ฉ์„ ๊ตฌ์ฒด์ ์œผ๋กœ ํ•„์š”๋กœ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—,
80+
# ``torchaudio`` ๋Š” ํ•ด๋‹น ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ์„น์…˜์—์„œ๋Š” ์ธ์ฝ”๋”ฉ ๊ธฐ์ดˆ์— ๋Œ€ํ•œ ์„ค๋ช…๋ณด๋‹ค ์กฐ๊ธˆ ๋” ๋‚˜์•„๊ฐ€๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.
8881
#
89-
# Firstly, we define the set of symbols. For example, we can use
90-
# ``'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'``. Then, we will map the
91-
# each character of the input text into the index of the corresponding
92-
# symbol in the table.
82+
# ๋จผ์ € ๊ธฐํ˜ธ๋“ค์˜ ์ง‘ํ•ฉ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ``'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'`` ์™€ ๊ฐ™์€ ๊ฒƒ๋“ค์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
83+
# ๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ ์ž…๋ ฅ ํ…์ŠคํŠธ์˜ ๊ฐ๊ฐ์˜ ๋ฌธ์ž๋ฅผ ํ…Œ์ด๋ธ” ์ƒ์—์„œ ๋Œ€์‘ํ•˜๋Š” ๊ธฐํ˜ธ์˜ ์ธ๋ฑ์Šค์— ๋งตํ•‘(mapping)ํ•ฉ๋‹ˆ๋‹ค.
9384
#
94-
# The following is an example of such processing. In the example, symbols
95-
# that are not in the table are ignored.
85+
# ์•„๋ž˜๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค. ํ…Œ์ด๋ธ”์— ํฌํ•จ๋˜์–ด์žˆ์ง€ ์•Š์€ ๊ธฐํ˜ธ๋“ค์€ ์ด ์˜ˆ์ œ์—์„œ ์ œ์™ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
9686
#
9787

9888
symbols = '_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'
@@ -108,10 +98,9 @@ def text_to_sequence(text):
10898

10999

110100
######################################################################
111-
# As mentioned in the above, the symbol table and indices must match
112-
# what the pretrained Tacotron2 model expects. ``torchaudio`` provides the
113-
# transform along with the pretrained model. For example, you can
114-
# instantiate and use such transform as follow.
101+
# ์œ„์—์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์ด, ๊ธฐํ˜ธ ํ…Œ์ด๋ธ”๊ณผ ์ธ๋ฑ์Šค๋Š” ์‚ฌ์ „ํ•™์Šต๋œ Tacotron2 ๋ชจ๋ธ์—์„œ ์š”๊ตฌํ•˜๋Š” ํ˜•ํƒœ์™€
102+
# ์ผ์น˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ``torchaudio`` ๋Š” ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์— ๋งž์ถ”์–ด ๋ณ€ํ™˜์‹œํ‚ค๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
103+
# ์ด ์˜ˆ์ œ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ธ์Šคํ„ด์Šคํ™”ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
115104
#
116105

117106
processor = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH.get_text_processor()
@@ -124,36 +113,32 @@ def text_to_sequence(text):
124113

125114

126115
######################################################################
127-
# The ``processor`` object takes either a text or list of texts as inputs.
128-
# When a list of texts are provided, the returned ``lengths`` variable
129-
# represents the valid length of each processed tokens in the output
130-
# batch.
116+
# ``processor`` ๊ฐ์ฒด๋Š” ํ…์ŠคํŠธ ๋˜๋Š” ํ…์ŠคํŠธ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„๋“ค์ž…๋‹ˆ๋‹ค.
117+
# ํ…์ŠคํŠธ ๋ฆฌ์ŠคํŠธ๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ, ๋ฐ˜ํ™˜๋˜๋Š” ``lenghts`` ๋ณ€์ˆ˜๋Š” ์ถœ๋ ฅ ๋ฐฐ์น˜(batch)์—์„œ
118+
# ์ฒ˜๋ฆฌ๋œ ๊ฐ ํ† ํฐ์˜ ์œ ํšจ ๊ธธ์ด๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
131119
#
132-
# The intermediate representation can be retrieved as follow.
120+
# ์ค‘๊ฐ„ ๋‹จ๊ณ„์˜ ํ˜•ํƒœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
133121
#
134122

135123
print([processor.tokens[i] for i in processed[0, :lengths[0]]])
136124

137125

138126
######################################################################
139-
# Phoneme-based encoding
140-
# ~~~~~~~~~~~~~~~~~~~~~~
127+
# ์Œ์†Œ ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ
128+
# ~~~~~~~~~~~~~~~~~~~~~~~~~
141129
#
142-
# Phoneme-based encoding is similar to character-based encoding, but it
143-
# uses a symbol table based on phonemes and a G2P (Grapheme-to-Phoneme)
144-
# model.
130+
# ์Œ์†Œ ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ์€ ๋ฌธ์ž ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ,
131+
# ์Œ์†Œ์— ๊ธฐ๋ฐ˜ํ•œ ๊ธฐํ˜ธ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๊ณ  G2P (Grapheme-to-Phoneme) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์—์„œ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
145132
#
146-
# The detail of the G2P model is out of scope of this tutorial, we will
147-
# just look at what the conversion looks like.
133+
# G2P ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ƒ์„ธํ•œ ๋‚ด์šฉ์€ ์ด๋ฒˆ ํŠœํ† ๋ฆฌ์–ผ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์—
134+
# ํ•ด๋‹น ๋ณ€ํ™˜์ด ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์–ด์ง€๋Š”์ง€๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
148135
#
149-
# Similar to the case of character-based encoding, the encoding process is
150-
# expected to match what a pretrained Tacotron2 model is trained on.
151-
# ``torchaudio`` has an interface to create the process.
136+
# ๋ฌธ์ž ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ์˜ ๊ฒฝ์šฐ์™€ ๋น„์Šทํ•˜๊ฒŒ, ์ธ์ฝ”๋”ฉ ๊ณผ์ •์€ ์‚ฌ์ „ํ•™์Šต๋œ Tacotron2๊ฐ€ ํ•™์Šต๋œ ํ˜•ํƒœ์— ๋งค์นญ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
137+
# ``torchaudio`` ๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ์œ„ํ•œ ์ธํ„ฐํŽ˜์ด์Šค(interface)๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
152138
#
153-
# The following code illustrates how to make and use the process. Behind
154-
# the scene, a G2P model is created using ``DeepPhonemizer`` package, and
155-
# the pretrained weights published by the author of ``DeepPhonemizer`` is
156-
# fetched.
139+
# ๋‹ค์Œ์˜ ์ฝ”๋“œ๋Š” ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ๋งŒ๋“ค๊ณ  ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
140+
# ๋’ค ํŽธ์—์„œ๋Š”, ``DeepPhonemizer`` ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ G2P ๋ชจ๋ธ์ด ์ƒ์„ฑ๋˜๊ณ  ``DeepPhonemizer`` ์˜ ์ €์ž๊ฐ€
141+
# ๊ณต๊ฐœํ•œ ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๊ฐ€ ๋ถˆ๋Ÿฌ๋“ค์—ฌ์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
157142
#
158143

159144
bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -169,32 +154,28 @@ def text_to_sequence(text):
169154

170155

171156
######################################################################
172-
# Notice that the encoded values are different from the example of
173-
# character-based encoding.
157+
# ์ธ์ฝ”๋”ฉ๋œ ๊ฐ’๋“ค์ด ๋ฌธ์ž ๊ธฐ๋ฐ˜ ์ธ์ฝ”๋”ฉ์˜ ์˜ˆ์ œ์™€๋Š” ๋‹ค๋ฅด๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์„ธ์š”.
174158
#
175-
# The intermediate representation looks like the following.
159+
# ์ค‘๊ฐ„ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชจ์Šต์„ ๋ณด์ž…๋‹ˆ๋‹ค.
176160
#
177161

178162
print([processor.tokens[i] for i in processed[0, :lengths[0]]])
179163

180164

181165
######################################################################
182-
# Spectrogram Generation
183-
# ----------------------
166+
# ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ ์ƒ์„ฑ
167+
# ------------------------------
184168
#
185-
# ``Tacotron2`` is the model we use to generate spectrogram from the
186-
# encoded text. For the detail of the model, please refer to `the
187-
# paper <https://arxiv.org/abs/1712.05884>`__.
169+
# ``Tacotron2`` ๋Š” ์ธ์ฝ”๋”ฉ๋œ ํ…์ŠคํŠธ๋กœ๋ถ€ํ„ฐ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
170+
# ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ์˜ `๋…ผ๋ฌธ<https://arxiv.org/abs/1712.05884>`__ ์„ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.
188171
#
189-
# It is easy to instantiate a Tacotron2 model with pretrained weight,
190-
# however, note that the input to Tacotron2 models are processed by the
191-
# matching text processor.
172+
# ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋กœ Tacotron2 ๋ชจ๋ธ์„ ์ธ์Šคํ„ด์Šคํ™” ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค.
173+
# ํ•˜์ง€๋งŒ Tacotron2 ๋ชจ๋ธ์˜ ์ž…๋ ฅ์€ ๋งค์นญ๋˜๋Š” ํ…์ŠคํŠธ ํ”„๋กœ์„ธ์„œ(text processor)๋กœ ์ฒ˜๋ฆฌ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„
174+
# ์œ ์˜ํ•ด์ฃผ์„ธ์š”.
192175
#
193-
# ``torchaudio`` bundles the matching models and processors together so
194-
# that it is easy to create the pipeline.
176+
# ``torchaudio`` ๋Š” ๋งค์นญ๋˜๋Š” ๋ชจ๋ธ๊ณผ ํ”„๋กœ์„ธ์„œ๋ฅผ ํ•จ๊ป˜ ๋ฌถ์–ด์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋งŒ๋“ค๊ธฐ ์‰ฝ๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค.
195177
#
196-
# (For the available bundles, and its usage, please refer to `the
197-
# documentation <https://pytorch.org/audio/stable/pipelines.html#tacotron2-text-to-speech>`__.)
178+
# (์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒˆ๋“ค์˜ ์ข…๋ฅ˜์™€ ์‚ฌ์šฉ๋ฒ•์ด ๊ถ๊ธˆํ•˜๋‹ค๋ฉด, `์ด ๋ฌธ์„œ <https://pytorch.org/audio/stable/pipelines.html#tacotron2-text-to-speech>`__ ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.)
198179
#
199180

200181
bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -214,8 +195,8 @@ def text_to_sequence(text):
214195

215196

216197
######################################################################
217-
# Note that ``Tacotron2.infer`` method perfoms multinomial sampling,
218-
# therefor, the process of generating the spectrogram incurs randomness.
198+
# ``Tacotron2.infer`` ๋ฉ”์†Œ๋“œ(method)๋Š” ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง(multinomial sampling)์„ ํ•œ๋‹ค๋Š” ์ ์„ ์œ ์˜ํ•˜์„ธ์š”,
199+
# ๋”ฐ๋ผ์„œ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•˜๋Š” ์ด ๊ณผ์ •์—์„œ ๋ฌด์ž‘์œ„์„ฑ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
219200
#
220201

221202
for _ in range(3):
@@ -226,23 +207,20 @@ def text_to_sequence(text):
226207

227208

228209
######################################################################
229-
# Waveform Generation
230-
# -------------------
210+
# ํŒŒํ˜• ์ƒ์„ฑ
211+
# ---------
231212
#
232-
# Once the spectrogram is generated, the last process is to recover the
233-
# waveform from the spectrogram.
213+
# ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์ด ์ผ๋‹จ ์ƒ์„ฑ๋˜๋ฉด, ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋Š” ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์œผ๋กœ๋ถ€ํ„ฐ ํŒŒํ˜•์„ ๋ณต์›ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
234214
#
235-
# ``torchaudio`` provides vocoders based on ``GriffinLim`` and
236-
# ``WaveRNN``.
215+
# ``torchaudio`` ๋Š” ๊ทธ๋ฆฌํ•€-๋ฆผ(``GriffinLim``)๊ณผ ์›จ์ด๋ธŒ RNN(``WaveRNN``)์— ๊ธฐ๋ฐ˜ํ•œ ๋ณด์ฝ”๋”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
237216
#
238217

239218

240219
######################################################################
241-
# WaveRNN
242-
# ~~~~~~~
220+
# ์›จ์ด๋ธŒ RNN
221+
# ~~~~~~~~~~~
243222
#
244-
# Continuing from the previous section, we can instantiate the matching
245-
# WaveRNN model from the same bundle.
223+
# ์ด์ „ ์„น์…˜์— ์ด์–ด์„œ, ๊ฐ™์€ ๋ฒˆ๋“ค์—์„œ ์ผ์น˜ํ•˜๋Š” ์›จ์ด๋ธŒ RNN ๋ชจ๋ธ์„ ์ธ์Šคํ„ด์Šคํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
246224
#
247225

248226
bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -265,11 +243,11 @@ def text_to_sequence(text):
265243

266244

267245
######################################################################
268-
# Griffin-Lim
269-
# ~~~~~~~~~~~
246+
# ๊ทธ๋ฆฌํ•€-๋ฆผ
247+
# ~~~~~~~~~
270248
#
271-
# Using the Griffin-Lim vocoder is same as WaveRNN. You can instantiate
272-
# the vocode object with ``get_vocoder`` method and pass the spectrogram.
249+
# ๊ทธ๋ฆฌํ•€-๋ฆผ ๋ณด์ฝ”๋”๋Š” ์›จ์ด๋ธŒ RNN๊ณผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด ๊ฐ™์Šต๋‹ˆ๋‹ค.
250+
# ๋ณด์ฝ”๋“œ ๊ฐ์ฒด๋ฅผ ``get_vocoder`` ๋ฉ”์†Œ๋“œ๋กœ ์ธ์Šคํ„ด์Šคํ™”ํ•˜์—ฌ ์ŠคํŽ™ํŠธ๋กœ๊ทธ๋žจ์„ ํ†ต๊ณผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
273251
#
274252

275253
bundle = torchaudio.pipelines.TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH
@@ -290,12 +268,11 @@ def text_to_sequence(text):
290268

291269

292270
######################################################################
293-
# Waveglow
294-
# ~~~~~~~~
271+
# ์›จ์ด๋ธŒ ๊ธ€๋กœ์šฐ(Waveglow)
272+
# ~~~~~~~~~~~~~~~~~~~~~~~
295273
#
296-
# Waveglow is a vocoder published by Nvidia. The pretrained weight is
297-
# publishe on Torch Hub. One can instantiate the model using ``torch.hub``
298-
# module.
274+
# ์›จ์ด๋ธŒ ๊ธ€๋กœ์šฐ๋Š” ์—”๋น„๋””์•„(Nvidia)๊ฐ€ ๊ณต๊ฐœํ•œ ๋ณด์ฝ”๋”์ž…๋‹ˆ๋‹ค. ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๊ฐ€ ํ† ์น˜ ํ—ˆ๋ธŒ(Torch Hub)์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
275+
# ``torch.hub`` ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ธ์Šคํ„ด์Šคํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
299276
#
300277

301278
waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')

0 commit comments

Comments
ย (0)