Abstract
The majority of acoustic signals contain additive reverberation noise, which degrades and distorts the reliability of the sound system and has detrimental effects on a variety of identification applications, including the speaker recognition field. This paper analyzed two techniques to mitigate and combat the impact of reverberation on sound and compared the performance of these methods. These techniques are Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC). The GFCC differs from the conventional MFCC in that it replaces the Mel filter bank with a Gamatone filter bank to increase durability.
To avoid the effects of environmental sounds and different features of the speaker voice duo to the variable situation of the speaker such as illness and emotion, a single tone of 1 KHz was applied to obtain a fair and impartial comparison between the GFCC and MFCC methods of sound signal recognition.
The comparison between the MFCC and GFCC features was accomplished by using PCA and corroborated by the normalized cross-correlation NCC. Reducing dimensions and removing correlation is the primary purpose of the PCA algorithm so that the features become orthogonalized. The PCA and NCC report that for both reverberant and non-reverberant single-tone recorded sound, there was a about 10% increase in the rate of detection and the variance increased by 11% for GFCC compared to MFCC features.
Then this work shows that method uses GFCC features is stronger and superior against the reverberation noise than classic MFCC features. Therefore, the GFCC mitigates the reverberation effect and presents a good candidate for functionality in actual recognition systems. In addition, this work examines the potential outcomes of joining the MFCC and GFCC as feature components to obtain a more robust speaker recognition system. The imrovment in the obtained variance is demonstrated by the results to be roughly 30% greater than in the case of GFCC feature coefficients variance.