Skip to content

Commit 5edf398

Browse files
authored
beginner/audio_data_augmentation_tutorial ๋ฒˆ์—ญ (#581)
beginner/audio_data_augmentation_tutorial ๋ฒˆ์—ญ (#581)
1 parent 06a6f70 commit 5edf398

1 file changed

Lines changed: 76 additions & 85 deletions

File tree

โ€Žbeginner_source/audio_data_augmentation_tutorial.pyโ€Ž

Lines changed: 76 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
# -*- coding: utf-8 -*-
22
"""
3-
Audio Data Augmentation
3+
์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
44
=======================
55
6-
``torchaudio`` provides a variety of ways to augment audio data.
6+
*์—ญ์ž*: Lee Jong Bub <https://github.com/bub3690>
77
8-
In this tutorial, we look into a way to apply effects, filters,
9-
RIR (room impulse response) and codecs.
8+
``torchaudio`` ๋Š” ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•์‹œํ‚ค๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
109
11-
At the end, we synthesize noisy speech over phone from clean speech.
10+
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ํšจ๊ณผ, ํ•„ํ„ฐ,
11+
๊ณต๊ฐ„ ์ž„ํŽ„์Šค ์‘๋‹ต(RIR, Room Impulse Response)๊ณผ ์ฝ”๋ฑ์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
12+
13+
ํ•˜๋‹จ๋ถ€์—์„œ๋Š”, ๊นจ๋—ํ•œ ์Œ์„ฑ์œผ๋กœ ๋ถ€ํ„ฐ ํœด๋Œ€ํฐ ๋„ˆ๋จธ์˜ ์žก์Œ์ด ๋‚€ ์Œ์„ฑ์„ ํ•ฉ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
1214
"""
1315

1416
import torch
@@ -19,10 +21,10 @@
1921
print(torchaudio.__version__)
2022

2123
######################################################################
22-
# Preparation
24+
# ์ค€๋น„
2325
# -----------
2426
#
25-
# First, we import the modules and download the audio assets we use in this tutorial.
27+
# ๋จผ์ €, ๋ชจ๋“ˆ์„ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ํŠœํ† ๋ฆฌ์–ผ์— ์‚ฌ์šฉํ•  ์˜ค๋””์˜ค ์ž๋ฃŒ๋“ค์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
2628
#
2729

2830
import math
@@ -39,64 +41,59 @@
3941

4042

4143
######################################################################
42-
# Applying effects and filtering
44+
# ํšจ๊ณผ์™€ ํ•„ํ„ฐ๋ง ์ ์šฉํ•˜๊ธฐ
4345
# ------------------------------
4446
#
45-
# :py:func:`torchaudio.sox_effects` allows for directly applying filters similar to
46-
# those available in ``sox`` to Tensor objects and file object audio sources.
47+
# :py:func:`torchaudio.sox_effects` ๋Š” ``sox`` ์™€ ์œ ์‚ฌํ•œ ํ•„ํ„ฐ๋“ค์„
48+
# ํ…์„œ ๊ฐ์ฒด๋“ค๊ณผ ํŒŒ์ผ ๊ฐ์ฒด ์˜ค๋””์˜ค ์†Œ์Šค๋“ค์— ์ง์ ‘ ์ ์šฉ ํ•ด์ค๋‹ˆ๋‹ค.
4749
#
48-
# There are two functions for this:
50+
# ์ด๋ฅผ ์œ„ํ•ด ๋‘๊ฐ€์ง€ ํ•จ์ˆ˜๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค:
4951
#
50-
# - :py:func:`torchaudio.sox_effects.apply_effects_tensor` for applying effects
51-
# to Tensor.
52-
# - :py:func:`torchaudio.sox_effects.apply_effects_file` for applying effects to
53-
# other audio sources.
52+
# - :py:func:`torchaudio.sox_effects.apply_effects_tensor` ๋Š” ํ…์„œ์—
53+
# ํšจ๊ณผ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
54+
# - :py:func:`torchaudio.sox_effects.apply_effects_file` ๋Š” ๋‹ค๋ฅธ ์˜ค๋””์˜ค ์†Œ์Šค๋“ค์—
55+
# ํšจ๊ณผ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
5456
#
55-
# Both functions accept effect definitions in the form
56-
# ``List[List[str]]``.
57-
# This is mostly consistent with how ``sox`` command works, but one caveat is
58-
# that ``sox`` adds some effects automatically, whereas ``torchaudio``โ€™s
59-
# implementation does not.
57+
# ๋‘ ํ•จ์ˆ˜๋“ค์€ ํšจ๊ณผ์˜ ์ •์˜๋ฅผ ``List[List[str]]`` ํ˜•ํƒœ๋กœ ๋ฐ›์•„๋“ค์ž…๋‹ˆ๋‹ค.
58+
# ``sox`` ์™€ ์ž‘๋™ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ฑฐ์˜ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ํ•œ๊ฐ€์ง€ ์œ ์˜์ ์€
59+
# ``sox`` ๋Š” ์ž๋™์œผ๋กœ ํšจ๊ณผ๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€๋งŒ, ``torchaudio`` ์˜ ๊ตฌํ˜„์€ ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
6060
#
61-
# For the list of available effects, please refer to `the sox
62-
# documentation <http://sox.sourceforge.net/sox.html>`__.
61+
# ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ํšจ๊ณผ๋“ค์˜ ๋ชฉ๋ก์„ ์•Œ๊ณ ์‹ถ๋‹ค๋ฉด, `the sox
62+
# documentation <http://sox.sourceforge.net/sox.html>`__ ์„ ์ฐธ์กฐํ•ด์ฃผ์„ธ์š”.
6363
#
64-
# **Tip** If you need to load and resample your audio data on the fly,
65-
# then you can use :py:func:`torchaudio.sox_effects.apply_effects_file`
66-
# with effect ``"rate"``.
64+
# **Tip** ์ฆ‰์„์œผ๋กœ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ๋กœ๋“œ์™€ ๋‹ค์‹œ ์ƒ˜ํ”Œ๋ง ํ•˜๊ณ ์‹ถ๋‹ค๋ฉด,
65+
# ํšจ๊ณผ ``"rate"`` ์™€ ํ•จ๊ป˜ :py:func:`torchaudio.sox_effects.apply_effects_file` ์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
6766
#
68-
# **Note** :py:func:`torchaudio.sox_effects.apply_effects_file` accepts a
69-
# file-like object or path-like object.
70-
# Similar to :py:func:`torchaudio.load`, when the audio format cannot be
71-
# inferred from either the file extension or header, you can provide
72-
# argument ``format`` to specify the format of the audio source.
67+
# **Note** :py:func:`torchaudio.sox_effects.apply_effects_file` ๋Š” ํŒŒ์ผ ํ˜•ํƒœ์˜ ๊ฐ์ฒด ๋˜๋Š” ์ฃผ์†Œ ํ˜•ํƒœ์˜ ๊ฐ์ฒด๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.
68+
# :py:func:`torchaudio.load` ์™€ ์œ ์‚ฌํ•˜๊ฒŒ, ์˜ค๋””์˜ค ํฌ๋งท์ด
69+
# ํŒŒ์ผ ํ™•์žฅ์ž๋‚˜ ํ—ค๋”๋ฅผ ํ†ตํ•ด ์ถ”๋ก ๋  ์ˆ˜ ์—†์œผ๋ฉด,
70+
# ์ „๋‹ฌ์ธ์ž ``format`` ์„ ์ฃผ์–ด, ์˜ค๋””์˜ค ์†Œ์Šค์˜ ํฌ๋งท์„ ๊ตฌ์ฒดํ™” ํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
7371
#
74-
# **Note** This process is not differentiable.
72+
# **Note** ์ด ๊ณผ์ •์€ ๋ฏธ๋ถ„ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
7573
#
7674

77-
# Load the data
75+
# ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
7876
waveform1, sample_rate1 = torchaudio.load(SAMPLE_WAV)
7977

80-
# Define effects
78+
# ํšจ๊ณผ๋“ค์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
8179
effects = [
82-
["lowpass", "-1", "300"], # apply single-pole lowpass filter
83-
["speed", "0.8"], # reduce the speed
84-
# This only changes sample rate, so it is necessary to
85-
# add `rate` effect with original sample rate after this.
80+
["lowpass", "-1", "300"], # ๋‹จ๊ทน ์ €์ฃผํŒŒ ํ†ต๊ณผ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
81+
["speed", "0.8"], # ์†๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค.
82+
# ์ด ๋ถ€๋ถ„์€ ์ƒ˜ํ”Œ ๋ ˆ์ดํŠธ๋งŒ ๋ณ€๊ฒฝํ•˜๊ธฐ์—, ์ดํ›„์—
83+
# ํ•„์ˆ˜์ ์œผ๋กœ `rate` ํšจ๊ณผ๋ฅผ ๊ธฐ์กด ์ƒ˜ํ”Œ ๋ ˆ์ดํŠธ๋กœ ์ฃผ์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.
8684
["rate", f"{sample_rate1}"],
87-
["reverb", "-w"], # Reverbration gives some dramatic feeling
85+
["reverb", "-w"], # ์ž”ํ–ฅ์€ ์•ฝ๊ฐ„์˜ ๊ทน์ ์ธ ๋А๋‚Œ์„ ์ค๋‹ˆ๋‹ค.
8886
]
8987

90-
# Apply effects
88+
# ํšจ๊ณผ๋“ค์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
9189
waveform2, sample_rate2 = torchaudio.sox_effects.apply_effects_tensor(waveform1, sample_rate1, effects)
9290

9391
print(waveform1.shape, sample_rate1)
9492
print(waveform2.shape, sample_rate2)
9593

9694
######################################################################
97-
# Note that the number of frames and number of channels are different from
98-
# those of the original after the effects are applied. Letโ€™s listen to the
99-
# audio.
95+
# ํšจ๊ณผ๊ฐ€ ์ ์šฉ๋˜๋ฉด, ํ”„๋ ˆ์ž„์˜ ์ˆ˜์™€ ์ฑ„๋„์˜ ์ˆ˜๋Š” ๊ธฐ์กด์— ์ ์šฉ๋œ ๊ฒƒ๋“ค๊ณผ ๋‹ฌ๋ผ์ง์— ์ฃผ์˜ํ•˜์„ธ์š”.
96+
# ์ด์ œ ์˜ค๋””์˜ค๋ฅผ ๋“ค์–ด๋ด…์‹œ๋‹ค.
10097
#
10198

10299
def plot_waveform(waveform, sample_rate, title="Waveform", xlim=None):
@@ -139,7 +136,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
139136
plt.show(block=False)
140137

141138
######################################################################
142-
# Original:
139+
# ๊ธฐ์กด:
143140
# ~~~~~~~~~
144141
#
145142

@@ -148,7 +145,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
148145
Audio(waveform1, rate=sample_rate1)
149146

150147
######################################################################
151-
# Effects applied:
148+
# ํšจ๊ณผ ์ ์šฉ ํ›„:
152149
# ~~~~~~~~~~~~~~~~
153150
#
154151

@@ -157,24 +154,22 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
157154
Audio(waveform2, rate=sample_rate2)
158155

159156
######################################################################
160-
# Doesnโ€™t it sound more dramatic?
157+
# ์ข€ ๋” ๊ทน์ ์œผ๋กœ ๋“ค๋ฆฌ์ง€ ์•Š๋‚˜์š”?
161158
#
162159

163160
######################################################################
164-
# Simulating room reverberation
161+
# ๋ฐฉ ์ž”ํ–ฅ ๋ชจ์˜ ์‹คํ—˜ํ•˜๊ธฐ
165162
# -----------------------------
166163
#
167164
# `Convolution
168-
# reverb <https://en.wikipedia.org/wiki/Convolution_reverb>`__ is a
169-
# technique that's used to make clean audio sound as though it has been
170-
# produced in a different environment.
165+
# reverb <https://en.wikipedia.org/wiki/Convolution_reverb>`__ ๋Š”
166+
# ๊นจ๋—ํ•œ ์˜ค๋””์˜ค๋ฅผ ๋‹ค๋ฅธ ํ™˜๊ฒฝ์—์„œ ์ƒ์„ฑ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋งŒ๋“ค์–ด์ฃผ๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.
171167
#
172-
# Using Room Impulse Response (RIR), for instance, we can make clean speech
173-
# sound as though it has been uttered in a conference room.
168+
# ์˜ˆ๋ฅผ๋“ค์–ด, ๊ณต๊ฐ„ ์ž„ํŽ„์Šค ์‘๋‹ต (RIR)์„ ํ™œ์šฉํ•˜์—ฌ, ๊นจ๋—ํ•œ ์Œ์„ฑ์„
169+
# ๋งˆ์น˜ ํšŒ์˜์‹ค์—์„œ ๋ฐœ์Œ๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
174170
#
175-
# For this process, we need RIR data. The following data are from the VOiCES
176-
# dataset, but you can record your own โ€” just turn on your microphone
177-
# and clap your hands.
171+
# ์ด ๊ณผ์ •์„ ์œ„ํ•ด์„œ, RIR ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ๋ฐ์ดํ„ฐ๋“ค์€ VOiCES ๋ฐ์ดํ„ฐ์…‹์—์„œ ์™”์Šต๋‹ˆ๋‹ค.
172+
# ํ•˜์ง€๋งŒ, ์ง์ ‘ ๋…น์Œํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. - ์ง์ ‘ ๋งˆ์ดํฌ๋ฅผ ์ผœ์‹œ๊ณ , ๋ฐ•์ˆ˜๋ฅผ ์น˜์„ธ์š”!
178173
#
179174

180175
rir_raw, sample_rate = torchaudio.load(SAMPLE_RIR)
@@ -183,8 +178,8 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
183178
Audio(rir_raw, rate=sample_rate)
184179

185180
######################################################################
186-
# First, we need to clean up the RIR. We extract the main impulse, normalize
187-
# the signal power, then flip along the time axis.
181+
# ๋จผ์ €, RIR์„ ๊นจ๋—ํ•˜๊ฒŒ ๋งŒ๋“ค์–ด์ค˜์•ผํ•ฉ๋‹ˆ๋‹ค. ์ฃผ์š”ํ•œ ์ž„ํŽ„์Šค๋ฅผ ์ถ”์ถœํ•˜๊ณ ,
182+
# ์‹ ํ˜ธ ์ „๋ ฅ์„ ์ •๊ทœํ™” ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ ์‹œ๊ฐ„์ถ•์„ ๋’ค์ง‘์–ด ์ค๋‹ˆ๋‹ค.
188183
#
189184

190185
rir = rir_raw[:, int(sample_rate * 1.01) : int(sample_rate * 1.3)]
@@ -194,7 +189,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
194189
plot_waveform(rir, sample_rate, title="Room Impulse Response")
195190

196191
######################################################################
197-
# Then, we convolve the speech signal with the RIR filter.
192+
# ๊ทธ ํ›„, RIR ํ•„ํ„ฐ์™€ ์Œ์„ฑ ์‹ ํ˜ธ๋ฅผ ํ•ฉ์„ฑ๊ณฑ ํ•ฉ๋‹ˆ๋‹ค.
198193
#
199194

200195
speech, _ = torchaudio.load(SAMPLE_SPEECH)
@@ -203,7 +198,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
203198
augmented = torch.nn.functional.conv1d(speech_[None, ...], RIR[None, ...])[0]
204199

205200
######################################################################
206-
# Original:
201+
# ๊ธฐ์กด:
207202
# ~~~~~~~~~
208203
#
209204

@@ -212,7 +207,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
212207
Audio(speech, rate=sample_rate)
213208

214209
######################################################################
215-
# RIR applied:
210+
# RIR ์ ์šฉ ํ›„:
216211
# ~~~~~~~~~~~~
217212
#
218213

@@ -222,13 +217,12 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
222217

223218

224219
######################################################################
225-
# Adding background noise
220+
# ๋ฐฐ๊ฒฝ ์†Œ์Œ ์ถ”๊ฐ€ํ•˜๊ธฐ
226221
# -----------------------
227222
#
228-
# To add background noise to audio data, you can simply add a noise Tensor to
229-
# the Tensor representing the audio data. A common method to adjust the
230-
# intensity of noise is changing the Signal-to-Noise Ratio (SNR).
231-
# [`wikipedia <https://en.wikipedia.org/wiki/Signal-to-noise_ratio>`__]
223+
# ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ์— ์†Œ์Œ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๊ฐ„๋‹จํžˆ ์†Œ์Œ ํ…์„œ๋ฅผ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ ํ…์„œ์— ๋”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
224+
# ์†Œ์Œ์˜ ์ •๋„๋ฅผ ์กฐ์ ˆํ•˜๋Š” ํ”ํ•œ ๋ฐฉ๋ฒ•์€ ์‹ ํ˜ธ ๋Œ€ ์žก์Œ๋น„ (SNR)๋ฅผ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
225+
# [`wikipedia <https://ko.wikipedia.org/wiki/%EC%8B%A0%ED%98%B8_%EB%8C%80_%EC%9E%A1%EC%9D%8C%EB%B9%84>`__]
232226
#
233227
# $$ \\mathrm{SNR} = \\frac{P_{signal}}{P_{noise}} $$
234228
#
@@ -250,7 +244,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
250244
noisy_speeches.append((scale * speech + noise) / 2)
251245

252246
######################################################################
253-
# Background noise:
247+
# ๋ฐฐ๊ฒฝ ์žก์Œ:
254248
# ~~~~~~~~~~~~~~~~~
255249
#
256250

@@ -290,13 +284,12 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
290284

291285

292286
######################################################################
293-
# Applying codec to Tensor object
287+
# ์ฝ”๋ฑ์„ ํ…์„œ ๊ฐ์ฒด์— ์ ์šฉํ•˜๊ธฐ
294288
# -------------------------------
295289
#
296-
# :py:func:`torchaudio.functional.apply_codec` can apply codecs to
297-
# a Tensor object.
290+
# :py:func:`torchaudio.functional.apply_codec` ๋Š” ํ…์„œ ์˜ค๋ธŒ์ ํŠธ์— ์ฝ”๋ฑ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
298291
#
299-
# **Note** This process is not differentiable.
292+
# **Note** ์ด ๊ณผ์ •์€ ๋ฏธ๋ถ„ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
300293
#
301294

302295

@@ -349,29 +342,27 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
349342
Audio(waveforms[2], rate=sample_rate)
350343

351344
######################################################################
352-
# Simulating a phone recoding
345+
# ์ „ํ™” ๋…น์Œ ๋ชจ์˜ ์‹คํ—˜ํ•˜๊ธฐ
353346
# ---------------------------
354347
#
355-
# Combining the previous techniques, we can simulate audio that sounds
356-
# like a person talking over a phone in a echoey room with people talking
357-
# in the background.
348+
# ์ด์ „ ๊ธฐ์ˆ ๋“ค์„ ํ˜ผํ•ฉํ•˜์—ฌ, ๋ฐ˜ํ–ฅ์žˆ๋Š” ๋ฐฉ์˜ ์‚ฌ๋žŒ๋“ค์ด ์ด์•ผ๊ธฐํ•˜๋Š” ๋ฐฐ๊ฒฝ์—์„œ ์ „ํ™” ํ†ตํ™”ํ•˜๋Š”
349+
# ๊ฒƒ ์ฒ˜๋Ÿผ ๋“ค๋ฆฌ๋Š” ์˜ค๋””์˜ค๋ฅผ ๋ชจ์˜ ์‹คํ—˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
358350
#
359351

360352
sample_rate = 16000
361353
original_speech, sample_rate = torchaudio.load(SAMPLE_SPEECH)
362354

363355
plot_specgram(original_speech, sample_rate, title="Original")
364356

365-
# Apply RIR
357+
# RIR ์ ์šฉํ•˜๊ธฐ
366358
speech_ = torch.nn.functional.pad(original_speech, (RIR.shape[1] - 1, 0))
367359
rir_applied = torch.nn.functional.conv1d(speech_[None, ...], RIR[None, ...])[0]
368360

369361
plot_specgram(rir_applied, sample_rate, title="RIR Applied")
370362

371-
# Add background noise
372-
# Because the noise is recorded in the actual environment, we consider that
373-
# the noise contains the acoustic feature of the environment. Therefore, we add
374-
# the noise after RIR application.
363+
# ๋ฐฐ๊ฒฝ ์žก์Œ ์ถ”๊ฐ€ํ•˜๊ธฐ
364+
# ์žก์Œ์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ๋…น์Œ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์žก์Œ์ด ํ™˜๊ฒฝ์˜ ์Œํ–ฅ ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ๊ณ ๋ คํ–ˆ์Šต๋‹ˆ๋‹ค.
365+
# ๋”ฐ๋ผ์„œ, RIR ์ ์šฉ ํ›„์— ์žก์Œ์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค
375366
noise, _ = torchaudio.load(SAMPLE_NOISE)
376367
noise = noise[:, : rir_applied.shape[1]]
377368

@@ -381,7 +372,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
381372

382373
plot_specgram(bg_added, sample_rate, title="BG noise added")
383374

384-
# Apply filtering and change sample rate
375+
# ํ•„ํ„ฐ๋ง์„ ์ ์šฉํ•˜๊ณ  ์ƒ˜ํ”Œ ๋ ˆ์ดํŠธ ์ˆ˜์ •ํ•˜๊ธฐ
385376
filtered, sample_rate2 = torchaudio.sox_effects.apply_effects_tensor(
386377
bg_added,
387378
sample_rate,
@@ -401,42 +392,42 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
401392

402393
plot_specgram(filtered, sample_rate2, title="Filtered")
403394

404-
# Apply telephony codec
395+
# ์ „ํ™” ์ฝ”๋ฑ ์ ์šฉํ•˜๊ธฐ
405396
codec_applied = F.apply_codec(filtered, sample_rate2, format="gsm")
406397

407398
plot_specgram(codec_applied, sample_rate2, title="GSM Codec Applied")
408399

409400

410401
######################################################################
411-
# Original speech:
402+
# ๊ธฐ์กด ์Œ์„ฑ:
412403
# ~~~~~~~~~~~~~~~~
413404
#
414405

415406
Audio(original_speech, rate=sample_rate)
416407

417408
######################################################################
418-
# RIR applied:
409+
# RIR ์ ์šฉ ํ›„:
419410
# ~~~~~~~~~~~~
420411
#
421412

422413
Audio(rir_applied, rate=sample_rate)
423414

424415
######################################################################
425-
# Background noise added:
416+
# ๋ฐฐ๊ฒฝ ์žก์Œ ์ถ”๊ฐ€ ํ›„:
426417
# ~~~~~~~~~~~~~~~~~~~~~~~
427418
#
428419

429420
Audio(bg_added, rate=sample_rate)
430421

431422
######################################################################
432-
# Filtered:
423+
# ํ•„ํ„ฐ๋ง ์ ์šฉ ํ›„:
433424
# ~~~~~~~~~
434425
#
435426

436427
Audio(filtered, rate=sample_rate2)
437428

438429
######################################################################
439-
# Codec aplied:
430+
# ์ฝ”๋ฑ ์ ์šฉ ํ›„:
440431
# ~~~~~~~~~~~~~
441432
#
442433

0 commit comments

Comments
ย (0)