Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

Cited 3 time in webofscience Cited 3 time in scopus
  • Hit : 387
  • Download : 0
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.
Publisher
SPRINGER
Issue Date
2020-10
Language
English
Article Type
Article
Citation

JOURNAL OF SUPERCOMPUTING, v.76, no.10, pp.8193 - 8213

ISSN
0920-8542
DOI
10.1007/s11227-019-02785-x
URI
http://hdl.handle.net/10203/276584
Appears in Collection
CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0