11"""
2- Text-to-speech with torchaudio
3- ==============================
4-
5- **Author**: `Yao-Yuan Yang <https://github.com/yangarbiter>`__, `Moto
6- Hira <moto@fb.com>`__
2+ torchaudio๋ฅผ ์ฌ์ฉํ์ฌ ํ
์คํธ์์ ์์ฑ์ผ๋ก ๋ณํ(text-to-speech)
3+ ==============================================================
4+ **Author**: `Yao-Yuan Yang <https://github.com/yangarbiter>`__, `Moto Hira <moto@fb.com>`__
5+ **๋ฒ์ญ์**: `์ด๊ฐ๋ <https://github.com/garam24>`__
76
87"""
98
109# %matplotlib inline
1110
1211
1312######################################################################
14- # Overview
13+ # ๊ฐ์
1514# --------
1615#
17- # This tutorial shows how to build text-to-speech pipeline, using the
18- # pretrained Tacotron2 in torchaudio.
16+ # ์ด๋ฒ ํํ ๋ฆฌ์ผ์์๋ torchaudio์์ ์ฌ์ ํ์ต๋ Tacotron2๋ฅผ ์ฌ์ฉํ์ฌ ํ
์คํธ์์ ์์ฑ์ผ๋ก ๋ณํํ๋
17+ # ํ์ดํ๋ผ์ธ์ ์๊ฐํฉ๋๋ค.
1918#
20- # The text-to-speech pipeline goes as follows : 1. Text preprocessing
19+ # ํ
์คํธ์์ ์์ฑ์ผ๋ก ๋ณํํ๋ ํ์ดํ๋ผ์ธ์ ๋ค์์ ๋จ๊ณ๋ฅผ ๋ฐ๋ฆ
๋๋ค : 1. ํ
์คํธ ์ ์ฒ๋ฆฌ
2120#
22- # First, the input text is encoded into a list of symbols. In this
23- # tutorial, we will use English characters and phonemes as the symbols .
21+ # ๋จผ์ , ์
๋ ฅ ํ
์คํธ๋ฅผ ๊ธฐํธ ๋ฆฌ์คํธ๋ก ์ธ์ฝ๋ฉ(encoding)ํฉ๋๋ค. ์ด ํํ ๋ฆฌ์ผ์์๋ ์๋ฌธ์๋ฅผ ์ฌ์ฉํ๊ณ
22+ # ๊ธฐํธ๋ก๋ ์์(phonene)๋ฅผ ์ฌ์ฉํ๊ณ ์ ํฉ๋๋ค .
2423#
25- # 2. Spectrogram generation
24+ # 2. ์คํํธ๋ก๊ทธ๋จ(spectrogram) ์์ฑ
2625#
27- # From the encoded text, a spectrogram is generated. We use ``Tacotron2``
28- # model for this.
26+ # ์ธ์ฝ๋ฉ๋ ํ
์คํธ๋ก๋ถํฐ ์คํํธ๋ก๊ทธ๋จ์ ์์ฑํฉ๋๋ค. ์ด๋ฅผ ์ํด ``Tacotron2`` ๋ชจ๋ธ์ ์ฌ์ฉํ ์์ ์
๋๋ค.
2927#
30- # 3. Time- domain conversion
28+ # 3. ์๊ฐ-๋๋ฉ์ธ(time- domain) ๋ณํ
3129#
32- # The last step is converting the spectrogram into the waveform. The
33- # process to generate speech from spectrogram is also called Vocoder. In
34- # this tutorial, three different vocoders are used,
30+ # ๋ง์ง๋ง ๋จ๊ณ์์ ์คํํธ๋ก๊ทธ๋จ์ ํํ( waveform)์ผ๋ก ๋ณํํฉ๋๋ค.
31+ # ์คํํธ๋ก๊ทธ๋จ์ผ๋ก๋ถํฐ ์์ฑ์ ์์ฑํ๋ ์ด ๊ณผ์ ์ ๋ณด์ฝ๋(vocoder)๋ผ๊ณ ๋ถ๋ฅด๊ธฐ๋ ํฉ๋๋ค.
32+ # ์ด ํํ ๋ฆฌ์ผ์์๋ ์ธ ๊ฐ์ง ์ข
๋ฅ์ ๋ณด์ฝ๋๊ฐ ์ฌ์ฉ๋ฉ๋๋ค.
3533# ```WaveRNN`` <https://pytorch.org/audio/stable/models/wavernn.html>`__,
3634# ```Griffin-Lim`` <https://pytorch.org/audio/stable/transforms.html#griffinlim>`__,
3735# and
3836# ```Nvidia's WaveGlow`` <https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/>`__.
3937#
40- # The following figure illustrates the whole process .
38+ # ๋ค์ ๊ทธ๋ฆผ์ ์ ์ฒด ๊ณผ์ ์ ๋ณด์ฌ์ค๋๋ค .
4139#
4240# .. image:: https://download.pytorch.org/torchaudio/tutorial-assets/tacotron2_tts_pipeline.png
4341#
4442
4543
4644######################################################################
47- # Preparation
45+ # ์ค๋น ๋จ๊ณ
4846# -----------
4947#
50- # First, we install the necessary dependencies. In addition to
51- # ``torchaudio``, ``DeepPhonemizer`` is required to perform phoneme-based
52- # encoding.
48+ # ๋จผ์ , ํ์ํ ๋ผ์ด๋ธ๋ฌ๋ผ๋ฅผ ์ค์นํฉ๋๋ค. ์์ ๋จ์ ์ธ์ฝ๋ฉ์ ํ๊ธฐ ์ํด์๋ ``torchaudio`` ๋ฅผ ๋น๋กฏํ์ฌ, ``DeepPhonemizer`` ๊ฐ ํ์ํฉ๋๋ค.
5349#
54-
55- # When running this example in notebook, install DeepPhonemizer
50+ # ์ฃผํผํฐ ๋
ธํธ๋ถ์์ ์ด ์์ ๋ฅผ ์คํํ ๋, DeepPhonemizer๋ฅผ ์ค์นํด์ฃผ์ธ์.
5651# !pip3 install deep_phonemizer
5752
5853import torch
7065
7166
7267######################################################################
73- # Text Processing
68+ # ํ
์คํธ ์ฒ๋ฆฌ
7469# ---------------
7570#
7671
7772
7873######################################################################
79- # Character-based encoding
74+ # ๋ฌธ์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ
8075# ~~~~~~~~~~~~~~~~~~~~~~~~
8176#
82- # In this section, we will go through how the character-based encoding
83- # works.
77+ # ์ด๋ฒ ์น์
์์๋ ๋ฌธ์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ์ด ์ด๋ป๊ฒ ์ด๋ฃจ์ด์ง๋์ง ๋ค๋ฃฐ ์์ ์
๋๋ค.
8478#
85- # Since the pre-trained Tacotron2 model expects specific set of symbol
86- # tables, the same functionalities available in ``torchaudio``. This
87- # section is more for the explanation of the basis of encoding.
79+ # ์ฌ์ ํ์ต๋ Tacotron2 ๋ชจ๋ธ์ ๊ธฐํธ ํ
์ด๋ธ๋ค์ ์งํฉ์ ๊ตฌ์ฒด์ ์ผ๋ก ํ์๋ก ํ๊ธฐ ๋๋ฌธ์,
80+ # ``torchaudio`` ๋ ํด๋น ๊ธฐ๋ฅ์ ์ ๊ณตํ๊ณ ์์ต๋๋ค. ์ด๋ฒ ์น์
์์๋ ์ธ์ฝ๋ฉ ๊ธฐ์ด์ ๋ํ ์ค๋ช
๋ณด๋ค ์กฐ๊ธ ๋ ๋์๊ฐ๊ณ ์ ํฉ๋๋ค.
8881#
89- # Firstly, we define the set of symbols. For example, we can use
90- # ``'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'``. Then, we will map the
91- # each character of the input text into the index of the corresponding
92- # symbol in the table.
82+ # ๋จผ์ ๊ธฐํธ๋ค์ ์งํฉ์ ์ ์ํฉ๋๋ค. ์๋ฅผ ๋ค์ด, ``'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'`` ์ ๊ฐ์ ๊ฒ๋ค์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
83+ # ๊ทธ๋ฆฌ๊ณ ๋์ ์
๋ ฅ ํ
์คํธ์ ๊ฐ๊ฐ์ ๋ฌธ์๋ฅผ ํ
์ด๋ธ ์์์ ๋์ํ๋ ๊ธฐํธ์ ์ธ๋ฑ์ค์ ๋งตํ(mapping)ํฉ๋๋ค.
9384#
94- # The following is an example of such processing. In the example, symbols
95- # that are not in the table are ignored.
85+ # ์๋๋ ์ด๋ฌํ ๊ณผ์ ์ ์์์
๋๋ค. ํ
์ด๋ธ์ ํฌํจ๋์ด์์ง ์์ ๊ธฐํธ๋ค์ ์ด ์์ ์์ ์ ์ธํ์์ต๋๋ค.
9686#
9787
9888symbols = '_-!\' (),.:;? abcdefghijklmnopqrstuvwxyz'
@@ -108,10 +98,9 @@ def text_to_sequence(text):
10898
10999
110100######################################################################
111- # As mentioned in the above, the symbol table and indices must match
112- # what the pretrained Tacotron2 model expects. ``torchaudio`` provides the
113- # transform along with the pretrained model. For example, you can
114- # instantiate and use such transform as follow.
101+ # ์์์ ์ธ๊ธํ ๊ฒ๊ณผ ๊ฐ์ด, ๊ธฐํธ ํ
์ด๋ธ๊ณผ ์ธ๋ฑ์ค๋ ์ฌ์ ํ์ต๋ Tacotron2 ๋ชจ๋ธ์์ ์๊ตฌํ๋ ํํ์
102+ # ์ผ์นํด์ผํฉ๋๋ค. ``torchaudio`` ๋ ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ ๋ง์ถ์ด ๋ณํ์ํค๋ ๊ธฐ๋ฅ์ ์ ๊ณตํฉ๋๋ค.
103+ # ์ด ์์ ์์๋ ์ด๋ฌํ ๋ณํ ๊ธฐ๋ฅ์ ์๋์ ๊ฐ์ด ์ธ์คํด์คํํ์ฌ ์ฌ์ฉํ ์ ์์ต๋๋ค.
115104#
116105
117106processor = torchaudio .pipelines .TACOTRON2_WAVERNN_CHAR_LJSPEECH .get_text_processor ()
@@ -124,36 +113,32 @@ def text_to_sequence(text):
124113
125114
126115######################################################################
127- # The ``processor`` object takes either a text or list of texts as inputs.
128- # When a list of texts are provided, the returned ``lengths`` variable
129- # represents the valid length of each processed tokens in the output
130- # batch.
116+ # ``processor`` ๊ฐ์ฒด๋ ํ
์คํธ ๋๋ ํ
์คํธ ๋ฆฌ์คํธ๋ฅผ ์
๋ ฅ์ผ๋ก ๋ฐ์๋ค์
๋๋ค.
117+ # ํ
์คํธ ๋ฆฌ์คํธ๊ฐ ์ฃผ์ด์ง ๋, ๋ฐํ๋๋ ``lenghts`` ๋ณ์๋ ์ถ๋ ฅ ๋ฐฐ์น(batch)์์
118+ # ์ฒ๋ฆฌ๋ ๊ฐ ํ ํฐ์ ์ ํจ ๊ธธ์ด๋ฅผ ๋ํ๋
๋๋ค.
131119#
132- # The intermediate representation can be retrieved as follow .
120+ # ์ค๊ฐ ๋จ๊ณ์ ํํ๋ ๋ค์๊ณผ ๊ฐ์ด ๊ฒ์ํ ์ ์์ต๋๋ค .
133121#
134122
135123print ([processor .tokens [i ] for i in processed [0 , :lengths [0 ]]])
136124
137125
138126######################################################################
139- # Phoneme-based encoding
140- # ~~~~~~~~~~~~~~~~~~~~~~
127+ # ์์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ
128+ # ~~~~~~~~~~~~~~~~~~~~~~~~~
141129#
142- # Phoneme-based encoding is similar to character-based encoding, but it
143- # uses a symbol table based on phonemes and a G2P (Grapheme-to-Phoneme)
144- # model.
130+ # ์์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ์ ๋ฌธ์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ๊ณผ ์ ์ฌํ์ง๋ง,
131+ # ์์์ ๊ธฐ๋ฐํ ๊ธฐํธ ํ
์ด๋ธ์ ์ฌ์ฉํ๊ณ G2P (Grapheme-to-Phoneme) ๋ชจ๋ธ์ ์ฌ์ฉํ๋ค๋ ์ ์์ ๋ค๋ฆ
๋๋ค.
145132#
146- # The detail of the G2P model is out of scope of this tutorial, we will
147- # just look at what the conversion looks like .
133+ # G2P ๋ชจ๋ธ์ ๋ํ ์์ธํ ๋ด์ฉ์ ์ด๋ฒ ํํ ๋ฆฌ์ผ์ ๋ฒ์๋ฅผ ๋ฒ์ด๋๊ธฐ ๋๋ฌธ์
134+ # ํด๋น ๋ณํ์ด ์ด๋ป๊ฒ ์ด๋ฃจ์ด์ง๋์ง๋ฅผ ์ค์ฌ์ผ๋ก ์ดํด๋ณด๊ฒ ์ต๋๋ค .
148135#
149- # Similar to the case of character-based encoding, the encoding process is
150- # expected to match what a pretrained Tacotron2 model is trained on.
151- # ``torchaudio`` has an interface to create the process.
136+ # ๋ฌธ์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ์ ๊ฒฝ์ฐ์ ๋น์ทํ๊ฒ, ์ธ์ฝ๋ฉ ๊ณผ์ ์ ์ฌ์ ํ์ต๋ Tacotron2๊ฐ ํ์ต๋ ํํ์ ๋งค์นญ๋์ด์ผ ํฉ๋๋ค.
137+ # ``torchaudio`` ๋ ์ด๋ฌํ ๊ณผ์ ์ ์ํ ์ธํฐํ์ด์ค(interface)๋ฅผ ์ ๊ณตํฉ๋๋ค.
152138#
153- # The following code illustrates how to make and use the process. Behind
154- # the scene, a G2P model is created using ``DeepPhonemizer`` package, and
155- # the pretrained weights published by the author of ``DeepPhonemizer`` is
156- # fetched.
139+ # ๋ค์์ ์ฝ๋๋ ์ด๋ฌํ ๊ณผ์ ์ ๋ง๋ค๊ณ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ์ค๋๋ค.
140+ # ๋ค ํธ์์๋, ``DeepPhonemizer`` ํจํค์ง๋ฅผ ์ฌ์ฉํ์ฌ G2P ๋ชจ๋ธ์ด ์์ฑ๋๊ณ ``DeepPhonemizer`` ์ ์ ์๊ฐ
141+ # ๊ณต๊ฐํ ์ฌ์ ํ์ต๋ ๊ฐ์ค์น๊ฐ ๋ถ๋ฌ๋ค์ฌ์ง๊ฒ ๋ฉ๋๋ค.
157142#
158143
159144bundle = torchaudio .pipelines .TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -169,32 +154,28 @@ def text_to_sequence(text):
169154
170155
171156######################################################################
172- # Notice that the encoded values are different from the example of
173- # character-based encoding.
157+ # ์ธ์ฝ๋ฉ๋ ๊ฐ๋ค์ด ๋ฌธ์ ๊ธฐ๋ฐ ์ธ์ฝ๋ฉ์ ์์ ์๋ ๋ค๋ฅด๋ค๋ ์ ์ ์ ์ํ์ธ์.
174158#
175- # The intermediate representation looks like the following .
159+ # ์ค๊ฐ ๊ณผ์ ์ ๋ค์๊ณผ ๊ฐ์ ๋ชจ์ต์ ๋ณด์
๋๋ค .
176160#
177161
178162print ([processor .tokens [i ] for i in processed [0 , :lengths [0 ]]])
179163
180164
181165######################################################################
182- # Spectrogram Generation
183- # ----------------------
166+ # ์คํํธ๋ก๊ทธ๋จ ์์ฑ
167+ # ------------------------------
184168#
185- # ``Tacotron2`` is the model we use to generate spectrogram from the
186- # encoded text. For the detail of the model, please refer to `the
187- # paper <https://arxiv.org/abs/1712.05884>`__.
169+ # ``Tacotron2`` ๋ ์ธ์ฝ๋ฉ๋ ํ
์คํธ๋ก๋ถํฐ ์คํํธ๋ก๊ทธ๋จ์ ์์ฑํ๋ ๋ฐ ์ฌ์ฉ๋๋ ๋ชจ๋ธ์
๋๋ค.
170+ # ๋ชจ๋ธ์ ๋ํ ์์ธํ ๋ด์ฉ์ ๋ค์์ `๋
ผ๋ฌธ<https://arxiv.org/abs/1712.05884>`__ ์ ์ฐธ๊ณ ํด์ฃผ์ธ์.
188171#
189- # It is easy to instantiate a Tacotron2 model with pretrained weight,
190- # however, note that the input to Tacotron2 models are processed by the
191- # matching text processor .
172+ # ์ฌ์ ํ์ต๋ ๊ฐ์ค์น๋ก Tacotron2 ๋ชจ๋ธ์ ์ธ์คํด์คํ ํ๋ ๊ฒ์ ๊ฐ๋จํฉ๋๋ค.
173+ # ํ์ง๋ง Tacotron2 ๋ชจ๋ธ์ ์
๋ ฅ์ ๋งค์นญ๋๋ ํ
์คํธ ํ๋ก์ธ์(text processor)๋ก ์ฒ๋ฆฌ๋์ด์ผ ํ๋ค๋ ๊ฒ์
174+ # ์ ์ํด์ฃผ์ธ์ .
192175#
193- # ``torchaudio`` bundles the matching models and processors together so
194- # that it is easy to create the pipeline.
176+ # ``torchaudio`` ๋ ๋งค์นญ๋๋ ๋ชจ๋ธ๊ณผ ํ๋ก์ธ์๋ฅผ ํจ๊ป ๋ฌถ์ด์ ํ์ดํ๋ผ์ธ์ ๋ง๋ค๊ธฐ ์ฝ๋๋ก ํด์ค๋๋ค.
195177#
196- # (For the available bundles, and its usage, please refer to `the
197- # documentation <https://pytorch.org/audio/stable/pipelines.html#tacotron2-text-to-speech>`__.)
178+ # (์ฌ์ฉํ ์ ์๋ ๋ฒ๋ค์ ์ข
๋ฅ์ ์ฌ์ฉ๋ฒ์ด ๊ถ๊ธํ๋ค๋ฉด, `์ด ๋ฌธ์ <https://pytorch.org/audio/stable/pipelines.html#tacotron2-text-to-speech>`__ ๋ฅผ ์ฐธ๊ณ ํ์ธ์.)
198179#
199180
200181bundle = torchaudio .pipelines .TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -214,8 +195,8 @@ def text_to_sequence(text):
214195
215196
216197######################################################################
217- # Note that ``Tacotron2.infer`` method perfoms multinomial sampling,
218- # therefor, the process of generating the spectrogram incurs randomness .
198+ # ``Tacotron2.infer`` ๋ฉ์๋( method)๋ ๋คํญ ์ํ๋ง( multinomial sampling)์ ํ๋ค๋ ์ ์ ์ ์ํ์ธ์ ,
199+ # ๋ฐ๋ผ์ ์คํํธ๋ก๊ทธ๋จ์ ์์ฑํ๋ ์ด ๊ณผ์ ์์ ๋ฌด์์์ฑ์ด ๋ฐ์ํฉ๋๋ค .
219200#
220201
221202for _ in range (3 ):
@@ -226,23 +207,20 @@ def text_to_sequence(text):
226207
227208
228209######################################################################
229- # Waveform Generation
230- # -------------------
210+ # ํํ ์์ฑ
211+ # ---------
231212#
232- # Once the spectrogram is generated, the last process is to recover the
233- # waveform from the spectrogram.
213+ # ์คํํธ๋ก๊ทธ๋จ์ด ์ผ๋จ ์์ฑ๋๋ฉด, ๋ง์ง๋ง ๋จ๊ณ๋ ์คํํธ๋ก๊ทธ๋จ์ผ๋ก๋ถํฐ ํํ์ ๋ณต์ํ๋ ๊ฒ์
๋๋ค.
234214#
235- # ``torchaudio`` provides vocoders based on ``GriffinLim`` and
236- # ``WaveRNN``.
215+ # ``torchaudio`` ๋ ๊ทธ๋ฆฌํ-๋ฆผ(``GriffinLim``)๊ณผ ์จ์ด๋ธ RNN(``WaveRNN``)์ ๊ธฐ๋ฐํ ๋ณด์ฝ๋๋ฅผ ์ ๊ณตํฉ๋๋ค.
237216#
238217
239218
240219######################################################################
241- # WaveRNN
242- # ~~~~~~~
220+ # ์จ์ด๋ธ RNN
221+ # ~~~~~~~~~~~
243222#
244- # Continuing from the previous section, we can instantiate the matching
245- # WaveRNN model from the same bundle.
223+ # ์ด์ ์น์
์ ์ด์ด์, ๊ฐ์ ๋ฒ๋ค์์ ์ผ์นํ๋ ์จ์ด๋ธ RNN ๋ชจ๋ธ์ ์ธ์คํด์คํํ ์ ์์ต๋๋ค.
246224#
247225
248226bundle = torchaudio .pipelines .TACOTRON2_WAVERNN_PHONE_LJSPEECH
@@ -265,11 +243,11 @@ def text_to_sequence(text):
265243
266244
267245######################################################################
268- # Griffin-Lim
269- # ~~~~~~~~~~~
246+ # ๊ทธ๋ฆฌํ-๋ฆผ
247+ # ~~~~~~~~~
270248#
271- # Using the Griffin-Lim vocoder is same as WaveRNN. You can instantiate
272- # the vocode object with ``get_vocoder`` method and pass the spectrogram .
249+ # ๊ทธ๋ฆฌํ-๋ฆผ ๋ณด์ฝ๋๋ ์จ์ด๋ธ RNN๊ณผ ์ฌ์ฉํ๋ ๋ฐฉ์์ด ๊ฐ์ต๋๋ค.
250+ # ๋ณด์ฝ๋ ๊ฐ์ฒด๋ฅผ ``get_vocoder`` ๋ฉ์๋๋ก ์ธ์คํด์คํํ์ฌ ์คํํธ๋ก๊ทธ๋จ์ ํต๊ณผํ ์ ์์ต๋๋ค .
273251#
274252
275253bundle = torchaudio .pipelines .TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH
@@ -290,12 +268,11 @@ def text_to_sequence(text):
290268
291269
292270######################################################################
293- # Waveglow
294- # ~~~~~~~~
271+ # ์จ์ด๋ธ ๊ธ๋ก์ฐ( Waveglow)
272+ # ~~~~~~~~~~~~~~~~~~~~~~~
295273#
296- # Waveglow is a vocoder published by Nvidia. The pretrained weight is
297- # publishe on Torch Hub. One can instantiate the model using ``torch.hub``
298- # module.
274+ # ์จ์ด๋ธ ๊ธ๋ก์ฐ๋ ์๋น๋์(Nvidia)๊ฐ ๊ณต๊ฐํ ๋ณด์ฝ๋์
๋๋ค. ์ฌ์ ํ์ต๋ ๊ฐ์ค์น๊ฐ ํ ์น ํ๋ธ(Torch Hub)์ ๊ณต๊ฐ๋์ด ์์ต๋๋ค.
275+ # ``torch.hub`` ๋ชจ๋์ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ์ธ์คํด์คํ ํ ์ ์์ต๋๋ค.
299276#
300277
301278waveglow = torch .hub .load ('NVIDIA/DeepLearningExamples:torchhub' , 'nvidia_waveglow' , model_math = 'fp32' )
0 commit comments