Speech Emotion Recognition Using MFCC and SVM Classification
Keywords:
Speech Emotion Recognition, MFCC Features, CNN-SVM Hybrid Model, Deep Learning, Support Vector Machine, Audio Signal Processing, Librosa PythonAbstract
The analysis of human emotions from speech signals plays a major role in human-computer interaction through Speech Emotion Recognition (SER). The proposed system combines MFCCs as feature extraction elements with a CNN for deep feature learning which is then classified with an SVM to boost emotion recognition accuracy. The audio signal processing together with feature extraction operations rely on the Python-based Librosa library. The proposed approach uses the CNN to extract high-level speech data which SVM then classifies into categories effectively. The evaluation of this proposed method shows higher accuracy when implementing it on benchmark emotional speech datasets surpassing traditional MFCC-SVM systems. Deep learning united with SVM produces applications which suit actual usage through better generalization and more robustness for virtual assistant technology alongside sentiment evaluation and medical diagnostic systems. Experimental findings demonstrate that the model functions with high efficiency when identifying emotional patterns which strengthens its capacity for advanced applications in SER.
Downloads
References
Rao, K. Sreenivasa, et al. "Emotion recognition from speech." International Journal of Computer Science and Information Technologies 3.2 (2012): 3603-3607.
Yu, Feng, et al. "Emotion detection from speech to enrich multimedia content." Pacific-Rim Conference on Multimedia. Springer, Berlin, Heidelberg, 2001.
Pfister, Tomas. "Emotion Detection from Speech." 2010.
Sapra, Ankur, Nikhil Panwar, and Sohan Panwar. "Emotion recognition from speech." International journal of emerging technology and advanced engineering 3 (2013): 341-345.
Utane, Akshay S., and S. L. Nalbalwar. "Emotion recognition through Speech." International Journal of Applied Information Syatems (IJAIS) (2013): 5-8.
El Ayadi, Moataz, Mohamed S. Kamel, and Fakhri Karray. "Survey on speech emotion recognition: Features, classification schemes, and databases." Pattern Recognition44.3 (2011): 572-587.
Kim, Samuel, et al. "Real-time emotion detection system using speech: Multi-modal fusion of different timescale features." Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on. IEEE, 2007.
Farouk, Mohamed Hesham. "Emotion Recognition from Speech." Application of Wavelets in Speech Processing. Springer, Cham, 2018. 51-55.
Schuller, Björn, Gerhard Rigoll, and Manfred Lang. "Hidden Markov model-based speech emotion recognition." Multimedia and Expo, 2003. ICME'03. Proceedings. 2003 International Conference on. Vol. 1. IEEE, 2003.
Kwon, Oh-Wook, et al. "Emotion recognition by speech signals." Eighth European Conference on Speech Communication and Technology. 2003.
Wendemuth, Andreas, et al. "Emotion Recognition from Speech." Companion Technology. Springer, Cham, 2017. 409-428.
Schuller, Björn, Gerhard Rigoll, and Manfred Lang. "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture." Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on. Vol. 1. IEEE, 2004.
Nwe, Tin Lay, Say Wei Foo, and Liyanage C. De Silva. "Speech emotion recognition using hidden Markov models." Speech communication 41.4 (2003): 603-623.
Busso, Carlos, et al. "Iterative feature normalization scheme for automatic emotion detection from speech." IEEE transactions on affective computing 4.4 (2013): 386-397.
Sethu, Vidhyasaharan, Eliathamby Ambikairajah, and Julien Epps. "Speaker normalisation for speech-based emotion detection." Digital Signal Processing, 2007 15th International Conference on. IEEE, 2007.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.