Linear multichannel blind source separation
based on time-frequency mask obtained by
harmonic/percussive sound separation
Oyabu Soichiro1, Daichi Kitamura1, and Kohei Yatabe2
1National Institute of Technology, Kagawa College, Japan
2The University of Waseda, Japan
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Background
• Blind source separation (BSS)
– aims at separating audio sources such as speech, noise,
musical instruments, and so on
– can be used for many audio applications
– extracts individual sources without any prior information or
training (unsupervised technique)
2
Mixture signal Separated drum source
• Multichannel BSS
– estimates demixing system (inverse system of )
– High separation quality because of spatial information
• Independent vector analysis (IVA) [Kim+, 2007]
• Independent low-rank matrix (ILRMA) [Kitamura+, 2016]
• Time-frequency-masking-based BSS (TFMBSS) [Yatabe+, 2019]
• Single-channel BSS
– Monaural audio source separation problem
– More difficult than multichannel BSS
• Harmonic/percussive sound separation (HPSS) [Ono+, 2008]
Background
3
Mixing system Demixing system
Background
• HPSS [Ono+, 2008], [FitzGerald, 2010], [Duong+, 2011], [Tachibana+, 2012], etc.
– separates harmonic and percussive sound sources in a
fully blind manner
– can be used for music analysis and remixing
• Ex. estimation of chords, tempo, rhythm, notes, and genre
4
Mixture signal
Harmonic estimate
Percussive estimate
Aim: high-quality multichannel blind HPSS
Source Models in BSS
• In BSS, assumption of time-frequency structure in
each source (source model) is required
– IVA [Kim+, 2007]
• All the frequencies of each source
simultaneously have large power
• Linear spatial demixing
– ILRMA [Kitamura+, 2016]
• Power spectrogram of each source
have a low-rank time-frequency
structure
• Linear spatial demixing
– HPSS [Ono+, 2008], [Tachibana+, 2012]
• Harmonic: time-continuous structure
• Percussive: freq.-continuous structure
• Non-linear separation
5
Freq.
Freq.
Freq.
Time
Time
Time
Percussive src.
Harmonic src.
Source2
Source1
Source2
Source1
Conventional Method: TFMBSS [Yatabe+, 2019]
• Time-frequency-masking-based BSS (TFMBSS)
– Linear multichannel BSS with plug-and-play source models
– Source model is input as a time-frequency mask
6
Generate time-frequency
mask based on temporal
estimated sources
Masking process
: entrywise product
• Harmonic/percussive sound separation (HPSS)
– Separate sources by focusing on “smoothness” along with
time or frequency directions in spectrogram
– Estimate and by iteratively minimizing the cost function
Conventional Method: HPSS [Ono+, 2008]
7
Harmonic estimate
Mixture signal
Harmonic
components
Percussive
components
Time
Frequency
Percussive estimate
Proposed Algorithm: Process Flow
8
Inverse
STFT
STFT
Back
projection
Smoothing
Smoothing
HPSS
HPSS
Old masks
Initialize and with
Initialize and with
Back projection
Back projection
Observed signal Percussive estimate
Harmonic estimate
Masks
TFMBSS
Smoothed
masks
Ch 1
Ch 2
Temporal
estimates
• Two HPSS are independently
applied to each of temporarily
estimated signals and
• Two Wiener-like masks and , are
constructed using the results of HPSS
– These masks enhance the harmonic or percussive
components by eliminating the other components
Proposed Algorithm: Mask Calculation
9
HPSS
HPSS
Proposed Algorithm: Mask Smoothing
10
• In TFMBSS, drastic change of masks in each
iteration will cause instability of parameter
optimization
• Introduce mask smoothing process based on
weighted geometric mean
– Intensity of smoothing can be controlled by and
Mask calculated in
the previous iteration
Mask calculated in
the current iteration
Entrywise
product
Proposed Algorithm: Process Flow
11
Inverse
STFT
STFT
Back
projection
Smoothing
Smoothing
HPSS
HPSS
Old masks
Initialize and with
Initialize and with
Back projection
Back projection
Observed signal Percussive estimate
Harmonic estimate
Masks
TFMBSS
Smoothed
masks
Ch 1
Ch 2
Temporal
estimates
• Mixing condition of dry sources
Experiments: Conditions
12
Music Dataset
(dry sources)
SiSEC2016 MUS [Liutkus+, 2016]
“Drums” and “Other” sources of 20 songs
Windowing in STFT 128-ms-long Hann window with half-overlap shifting
Number of iterations
in TFMBSS
500
Subjective evaluation
score
Improvement of source-to-distortion ratio
(SDR) [Vincent+, 2006]
2 m
5.66cm
50 50
Impulse response E2A in RWCP database [Nakamura+, 2000]
(reverberation time: 300 ms)
Other source
(harmonic)
Drum source
(Percussive)
Experiment 1: Conditions
• Investigate the optimal number of iterations in HPSS
blocks
• Compare average SDR imp. of the proposed method
13
• Average SDR improvements of the proposed method
with various numbers of iterations in HPSS blocks
0
2
4
6
8
10
12
1 3 5 7 9 11 13 15 20
Average
SDR
improvements
[dB]
Number of iterations in HPSS blocks
Experiment 1: Results
14
Experiment 2: Conditions
• Investigate the optimal smoothing parameter
• Compare average SDR imp. of the proposed method
15
• Typical example of SDR behaviors in the proposed
method with various smoothing parameters
-2
0
2
4
6
8
10
12
14
16
0 100 200 300 400 500
SDR
improvements
[dB]
Numbers of iterations of proposed method
βold=0
βold=0.5
βold=0.75
βold=0.875
βold=0.9375
Experiment 2: Results
16
0
2
4
6
8
10
12
0 0.5 0.75 0.875 0.9375
Average
SDR
improvements
[dB]
Smoothing parameter
Experiment 2: Results
17
• Average SDR improvements of the proposed method
with various smoothing parameters
Experiment 3: Conditions
• Compare proposed method with five BSS methods
– Single-channel HPSS [Ono+, 2008] , [Tachibana+, 2010], [Tachibana+, 2012]
– Multichannel HPSS [Duong+, 2011]
– AuxIVA [Ono+, 2011]
– ILRMA [Kitamura+, 2016]
– Proposed method
18
• Average SDR improvement for HPSS and BSS
algorithms
Experiment 3: Results
0
2
4
6
8
10
12
Single-channel
HPSS
Multichannel
HPSS
AuxIVA ILRMA Proposed
method
Average
SDR
improvements
[dB]
Linear multichannel BSS
Non-linear
single-channel
BSS
19
• Music sample: SiSEC2016 Forkupines - Semantics
Demonstration
20
Observed
mixture
Single-
channel
HPSS
AuxIVA ILRMA
Proposed
method
Estimated harmonic sources
Estimated percussive sources
Conclusion
• Novelty
– Integration of conventional HPSS with TFMBSS
– Wiener-filter-based iterative mask update
– Iterative mask smoothing for stabilizing optimization
• Results
– Mask smoothing process drastically stabilizes the TFMBSS
optimization and improves the separation performance
– Achieved high-quality linear multichannel HPSS
21
Thank you for your attention!

Linear multichannel blind source separation based on time-frequency mask obtained by harmonic/percussive sound separation

  • 1.
    Linear multichannel blindsource separation based on time-frequency mask obtained by harmonic/percussive sound separation Oyabu Soichiro1, Daichi Kitamura1, and Kohei Yatabe2 1National Institute of Technology, Kagawa College, Japan 2The University of Waseda, Japan 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2.
    Background • Blind sourceseparation (BSS) – aims at separating audio sources such as speech, noise, musical instruments, and so on – can be used for many audio applications – extracts individual sources without any prior information or training (unsupervised technique) 2 Mixture signal Separated drum source
  • 3.
    • Multichannel BSS –estimates demixing system (inverse system of ) – High separation quality because of spatial information • Independent vector analysis (IVA) [Kim+, 2007] • Independent low-rank matrix (ILRMA) [Kitamura+, 2016] • Time-frequency-masking-based BSS (TFMBSS) [Yatabe+, 2019] • Single-channel BSS – Monaural audio source separation problem – More difficult than multichannel BSS • Harmonic/percussive sound separation (HPSS) [Ono+, 2008] Background 3 Mixing system Demixing system
  • 4.
    Background • HPSS [Ono+,2008], [FitzGerald, 2010], [Duong+, 2011], [Tachibana+, 2012], etc. – separates harmonic and percussive sound sources in a fully blind manner – can be used for music analysis and remixing • Ex. estimation of chords, tempo, rhythm, notes, and genre 4 Mixture signal Harmonic estimate Percussive estimate Aim: high-quality multichannel blind HPSS
  • 5.
    Source Models inBSS • In BSS, assumption of time-frequency structure in each source (source model) is required – IVA [Kim+, 2007] • All the frequencies of each source simultaneously have large power • Linear spatial demixing – ILRMA [Kitamura+, 2016] • Power spectrogram of each source have a low-rank time-frequency structure • Linear spatial demixing – HPSS [Ono+, 2008], [Tachibana+, 2012] • Harmonic: time-continuous structure • Percussive: freq.-continuous structure • Non-linear separation 5 Freq. Freq. Freq. Time Time Time Percussive src. Harmonic src. Source2 Source1 Source2 Source1
  • 6.
    Conventional Method: TFMBSS[Yatabe+, 2019] • Time-frequency-masking-based BSS (TFMBSS) – Linear multichannel BSS with plug-and-play source models – Source model is input as a time-frequency mask 6 Generate time-frequency mask based on temporal estimated sources Masking process : entrywise product
  • 7.
    • Harmonic/percussive soundseparation (HPSS) – Separate sources by focusing on “smoothness” along with time or frequency directions in spectrogram – Estimate and by iteratively minimizing the cost function Conventional Method: HPSS [Ono+, 2008] 7 Harmonic estimate Mixture signal Harmonic components Percussive components Time Frequency Percussive estimate
  • 8.
    Proposed Algorithm: ProcessFlow 8 Inverse STFT STFT Back projection Smoothing Smoothing HPSS HPSS Old masks Initialize and with Initialize and with Back projection Back projection Observed signal Percussive estimate Harmonic estimate Masks TFMBSS Smoothed masks Ch 1 Ch 2 Temporal estimates
  • 9.
    • Two HPSSare independently applied to each of temporarily estimated signals and • Two Wiener-like masks and , are constructed using the results of HPSS – These masks enhance the harmonic or percussive components by eliminating the other components Proposed Algorithm: Mask Calculation 9 HPSS HPSS
  • 10.
    Proposed Algorithm: MaskSmoothing 10 • In TFMBSS, drastic change of masks in each iteration will cause instability of parameter optimization • Introduce mask smoothing process based on weighted geometric mean – Intensity of smoothing can be controlled by and Mask calculated in the previous iteration Mask calculated in the current iteration Entrywise product
  • 11.
    Proposed Algorithm: ProcessFlow 11 Inverse STFT STFT Back projection Smoothing Smoothing HPSS HPSS Old masks Initialize and with Initialize and with Back projection Back projection Observed signal Percussive estimate Harmonic estimate Masks TFMBSS Smoothed masks Ch 1 Ch 2 Temporal estimates
  • 12.
    • Mixing conditionof dry sources Experiments: Conditions 12 Music Dataset (dry sources) SiSEC2016 MUS [Liutkus+, 2016] “Drums” and “Other” sources of 20 songs Windowing in STFT 128-ms-long Hann window with half-overlap shifting Number of iterations in TFMBSS 500 Subjective evaluation score Improvement of source-to-distortion ratio (SDR) [Vincent+, 2006] 2 m 5.66cm 50 50 Impulse response E2A in RWCP database [Nakamura+, 2000] (reverberation time: 300 ms) Other source (harmonic) Drum source (Percussive)
  • 13.
    Experiment 1: Conditions •Investigate the optimal number of iterations in HPSS blocks • Compare average SDR imp. of the proposed method 13
  • 14.
    • Average SDRimprovements of the proposed method with various numbers of iterations in HPSS blocks 0 2 4 6 8 10 12 1 3 5 7 9 11 13 15 20 Average SDR improvements [dB] Number of iterations in HPSS blocks Experiment 1: Results 14
  • 15.
    Experiment 2: Conditions •Investigate the optimal smoothing parameter • Compare average SDR imp. of the proposed method 15
  • 16.
    • Typical exampleof SDR behaviors in the proposed method with various smoothing parameters -2 0 2 4 6 8 10 12 14 16 0 100 200 300 400 500 SDR improvements [dB] Numbers of iterations of proposed method βold=0 βold=0.5 βold=0.75 βold=0.875 βold=0.9375 Experiment 2: Results 16
  • 17.
    0 2 4 6 8 10 12 0 0.5 0.750.875 0.9375 Average SDR improvements [dB] Smoothing parameter Experiment 2: Results 17 • Average SDR improvements of the proposed method with various smoothing parameters
  • 18.
    Experiment 3: Conditions •Compare proposed method with five BSS methods – Single-channel HPSS [Ono+, 2008] , [Tachibana+, 2010], [Tachibana+, 2012] – Multichannel HPSS [Duong+, 2011] – AuxIVA [Ono+, 2011] – ILRMA [Kitamura+, 2016] – Proposed method 18
  • 19.
    • Average SDRimprovement for HPSS and BSS algorithms Experiment 3: Results 0 2 4 6 8 10 12 Single-channel HPSS Multichannel HPSS AuxIVA ILRMA Proposed method Average SDR improvements [dB] Linear multichannel BSS Non-linear single-channel BSS 19
  • 20.
    • Music sample:SiSEC2016 Forkupines - Semantics Demonstration 20 Observed mixture Single- channel HPSS AuxIVA ILRMA Proposed method Estimated harmonic sources Estimated percussive sources
  • 21.
    Conclusion • Novelty – Integrationof conventional HPSS with TFMBSS – Wiener-filter-based iterative mask update – Iterative mask smoothing for stabilizing optimization • Results – Mask smoothing process drastically stabilizes the TFMBSS optimization and improves the separation performance – Achieved high-quality linear multichannel HPSS 21 Thank you for your attention!

Editor's Notes

  • #2 1
  • #3 Blind source separation, / BSS in short, / aims at separating audio sources / such as speech, noise, musical instruments, and so on. This technique(テクニーク)can be used for many audio applications. BSS is an unsupervised method / that extracts individual sources without any prior(プライオァ)information or training (トゥレイニン).
  • #4 When we use multiple microphones in a recording, / the BSS problem is called “multichannel BSS.” This method estimates the separation system W, / which is an inverse system of the mixing system A, / as shown in this figure. (ポインタ指す) Multichannel BSS provides better separation quality / because we can use / “spatial information” / for estimating the demixing system W. Many algorithms have been proposed, / and the most succeeded methods are / independent vector analysis, / IVA, // independent low-rank matrix analysis, / ILRMA, // and time-frequency-masking-based BSS, / TFMBSS in short. On the other hand, / when the observed mixture is obtained as a monaural(モノーゥラル)format, the single-channel BSS / must be applied / to achieve the separation. This is more difficult problem compared with the multichannel BSS. In this presentation, / we only focus on the method called / “harmonic/percussive sound separation,” / HPSS in short.
  • #5 HPSS aims to separate / harmonic and percussive audio sources, / such as drums and vocals, / in a fully blind manner. Such technique(テクニーク)is crucial for many applications including music analysis, / for example, / estimation of chords, tempo, rhythm, notes, and genre(ジャヌルァ). Our research aim is / to propose a high-quality multichannel blind HPSS method, / which is useful for the applications we explained.
  • #6 To achieve the source separation in a fully blind manner, / an assumption of time-frequency structure in each source is required, / where this assumption is called the “source model(マドー).” For example, / in IVA, / we assume that / all the frequencies of each source / simultaneously have large power like this figure.(ポインタ指す) In ILRMA, / power spectrogram of each source / have a low-rank time-frequency structure. (ポインタ指す) Both IVA and ILRMA estimate the spatial demixing matrix, / which provides multichannel linear separation results. In HPSS, / we assume two source models.(マドース)(ポインタ指す) For the harmonic source, / a time-continuous structure is assumed, / whereas a frequency-continuous structure is assumed for the percussive source. Since HPSS is a single-channel BSS, / the separation mechanism is non-linear, / and the artificial distortions are sometimes generated.
  • #7 As a generalized multichannel linear BSS framework, / time-frequency-masking-based BSS, / TFMBSS in short, // has been proposed. In TFMBSS, / the source models(マドース)can be easily replaced / as a plug-and-play manner, / where the source model(マドー)is input as a time-frequency mask. This slide shows the optimization algorithm in TFMBSS. For the detail explanation of this algorithm, / please see the reference paper. In the fourth line, / we generate a time-frequency mask, / a source model,(マドー)/ based on the temporal estimated sources z.(ズィー) Then, / we apply the time-frequency masking in the fifth line / to update the estimated sources. Thus, / any kind of time-frequency mask can be used / as a source model(マドー)in this BSS framework, / and we can achieve the linear multichannel BSS / as well as IVA and ILRMA.
  • #8 Next, / I introduce the detail of HPSS. HPSS separates harmonic and percussive sounds / by focusing on the “smoothness” / along with the time / or frequency directions in the spectrogram. This is because / the harmonic components become the horizontal stripe patterns(パタンズ), and the percussive components become the vertical stripe patterns(パタンズ).(それぞれポインタで指す) HPSS optimizes the separated signal H and P / by iteratively minimizing this cost function(ポインタで指す), / which enhances the vertical smoothness of H / and the horizontal smoothness of P, respectively. The smoothness assumption in HPSS / is reasonable for separating harmonic and percussive sounds. However, / since this single-channel BSS is a non-linear process, / artificial distortions may be noticeable. To achieve a high-quality linear multichannel HPSS, / we propose to integrate conventional HPSS / with the TFMBSS framework.
  • #9 This is the process flow of the proposed linear multichannel HPSS. In our method, TFMBSS(ポインタで指しながら)estimates the linear demixing system / using iterative optimization. In each iteration of TFMBSS, / the time-frequency masks of harmonic and percussive sources are updated / by the processes shown in the above(アバブ,ポインタで指しながら). In each iteration of TFMBSS, / temporal estimates of harmonic and percussive sources, / ZH(ズィーエイチ)and ZP(ズィーピー), / are respectively input to HPSS blocks(ポインタで指しながら). Then we obtain H and P components for each of ZH and ZP(ポインタで指しながら). From these components, / we calculate a new time-frequency masks, / MH and MP, / that enhance harmonic and percussive components in ZH and ZP, / respectively(ポインタで指しながら). After that, / a smoothing process is applied with the old masks / calculated in the previous iteration. Finally, / the smoothed masks are returned to TFMBSS(ポインタで指しながら). This operation is iterated / until TFMBSS is converged. The estimated harmonic and percussive source signals are obtained by inverse STFT. In the next two slides, / we explain the details of “the mask calculation,” // here,(指しながら)// and “the smoothing process,” // here(指しながら).
  • #10 In the proposed algorithm, / two HPSS are independently applied to each of temporarily estimated signals, / ZH and ZP. From ZH, / we obtain H super H and P super H.(superとはsuperscriptの略で,日本語で言う「H上付きH」のような意味) Also, / from ZP, / we obtain H super P and P super P. Using these HPSS results, we construct two Wiener-like masks MH and MP / as shown in these equations. These masks enhance the harmonic or percussive components / by eliminating the other components.
  • #11 In TFMBSS, / since the optimization is performed by the primal-dual splitting algorithm, / the drastic change of the masks in each iteration / will cause instability of the parameter optimization. To avoid this problem, / we introduce / “mask smoothing process” / based on the weighted geometric mean, / like this calculation(指しながら), / where M in the left-hand side is a mask of the current iteration, / Mold is the mask of the previous iteration, / and this operation(指しながら)is an entrywise product. Beta and beta old are the parameters that control the intensity of smoothing, / and the sum of beta and beta old equals unity.
  • #12 After the mask calculation and smoothing processes, / we input the new mask to the TFMBSS. This mask update is iterated / until TFMBSS converges.
  • #13 To evaluate our proposed method, / we conducted BSS experiment(イクスペリメント). This slide shows the experimental conditions. For the dry sources, we used SiSEC2016 MUS dataset. In particular, / we used “drums” and “other” musical sources of 20 songs, / where “other” sources include various(ベァリアス)types of harmonic musical instruments, / such as guitar, synthesizer, and so on. These dry sources are convoluted / and spatially mixed with the impulse responses recorded in this condition(指しながら), / and we produced / two-channel / and two-source observed signals. As the evaluation score, / we used an improvement of source-to-distortion ratio, / SDR, / which shows both the degree of separation / and the sound quality of separated signals.
  • #14 We conducted three experiments. In the first experiment, / we investigated the optimal number of iterations in the HPSS blocks. Since HPSS is also an iterative optimization algorithm, / we compare the average SDR improvements of the proposed method / with various settings of HPSS iteration.
  • #15 This is the result of the first experiment. The horizontal axis shows the number of iterations in the HPSS blocks, / and the vertical axis shows the average SDR improvements. We can confirm that / the fewer iterations in the HPSS blocks is not preferable(指しながら), / and the performance is saturated for more than 9 iterations.(指しながら) For the other experiments, / we set the number of iterations in HPSS / to 15.
  • #16 In the second experiment, / we investigated the optimal smoothing parameter “beta old” / in the smoothing blocks. This parameter controls the intensity of mask smoothing. If we set “beta old” to zero, / no smoothing is applied. Again, / we compare the SDR improvements of the proposed method.
  • #17 This graph is an example of SDR behaviors of the proposed method, / namely, / the horizontal axis shows the global iteration of the proposed method. The colors show the intensity of smoothing process. The brightest color corresponds to / “beta old equals zero,” / no smoothing, // and the darkest color represents strong smoothing. We can confirm that / the smoothing process can stabilize the behavior of the SDR improvement, / although the convergence speed slows down. This behavior was common / for all the songs.
  • #18 This figure shows the average SDR results of 20 songs with 500 global iterations. The horizontal axis shows the value of “beta old.” The condition “beta old equals 0.75(ズィロポイントセブンファイブ,指しながら)” / achieves the highest performance. Thus, in the third experiment, / we used this value.
  • #19 In the final experiment, / we compared the proposed method with five conventional BSS methods, namely, single-channel HPSS, multichannel HPSS, AuxIVA(オーグズアイヴィーエー), and ILRMA.
  • #20 This is the average result of each method. Please note that / only the single-channel HPSS is a non-linear method. From this result, the proposed method greatly outperforms the other methods / on the average of 20 songs.
  • #21 Finally, we demonstrate some examples of blind HPSS. The observed signal is a mixture of drums and guitar. 混合→Single-channel HPSSのH→AuxIVAのH→ILRMAのH→ProposedのHの順で次々再生(セリフは不要) 続いてSingle-channel HPSSのP→AuxIVAのP→ILRMAのP→ProposedのPの順で次々再生(セリフは不要) ※時間が無ければILRMAはスキップしてもOK 再生が終わったら無言でまとめページにページ送り
  • #22 This is a conclusion. 時間があれば説明する That’s all. Thank you for your attention.
  • #25 こういった,音源ごとの特質をあらかじめ仮定するのではなく この音源モデル自体を推定する方法として 時間周波数マスクに基づくBSS TFMBSSというフレームワークを紹介します. この手法は,先ほど考えた低ランク性や軸方向の滑らかさなどを 時間周波数マスクという形で音源モデルとして仮定します. 時間周波数マスクというのは, 図のように,赤,青2つの音源からなる混合信号に対し 赤の音源の部分を1それ以外を0とするようなマスクであり, (クリック) これを要素ごとに適用することで赤の音源のみを取り出すことができます. これを音源モデルとしたのがTFMBSSであり 多チャネルBSSであるため線形分離が達成可能です.
  • #26 先ほど紹介したHPSSとTFMBSSを用いた,卒業研究で提済みの従来手法のアルゴリズムをまず紹介します. まず観測信号をSTFTして,その後TFMBSSに取り込まれます. ここから生成された,調波成分に対応する中間変数zH,打撃成分に対応するzPのそれぞれが, 観測信号としてHPSSに取り込まれ調波成分と打撃成分に分離されます. そこからマスクを生成し,過去のマスクとスムージングを施し,新しくマスクを得ます. このスムージングはアルゴリズムを安定的に動かすための処理になります. これをTFMBSSに返すという動作を任意の反復回数繰り返した後, 逆STFTで時間信号に変換し,線形な打撃成分調波成分の音源得ます.
  • #27 次に実験2における従来手法のある一曲の反復毎のSDRの推移を示します. 横軸がTFMBSSにおける全体の反復回数で,縦軸が音源分離性能になっています. 線の色が薄いほど反復間のマスクの変化が大きく線の色が濃いほどマスクの変化が小さいです. スムージングをかけなくともほぼ安定してSDRが収束しています.逆にかけすぎると収束速度,収束値が減衰する傾向にあります.
  • #28 そして,音源分離という信号処理分野においての一般的な変換について解説します. 観測した時間信号から任意のフーリエ変換長分,離散フーリエ変換し一本のベクトルを生成します.(クリック)そしてもう一本(クリック)もう一本(クリック)というように 時間軸上に並べることで複素数要素を持った時間周波数表現であるスペクトログラムが生成されます.音源分離においては,このスペクトログラムを信号処理の対象とするのが一般的です. 次に,音源分離の中でも研究が盛んであるブラインド音源分離について解説します.
  • #29 スムージングの詳細