![]() |
International Journal of Scientific Research and Engineering Development( International Peer Reviewed Open Access Journal ) ISSN [ Online ] : 2581 - 7175 |
IJSRED » Archives » Volume 9 -Issue 2

π Paper Information
| π Paper Title | Multi-Modal Emotion Recognition System Using Facial Features, Acoustic Patterns, and Textual Sentiment Analysis |
| π€ Authors | Jalasuthrapu Ravindra Babu, Gangula Nagul Meera, Dala Manas, Guvvala Praveen Kumar, Kanaparthi Vidwan |
| π Published Issue | Volume 9 Issue 2 |
| π Year of Publication | 2026 |
| π Unique Identification Number | IJSRED-V9I2P152 |
| π Search on Google | Click Here |
π Abstract
Feelings greatly influence the way humans talk but computers generally find it difficult to grasp them, especially if only one kind of input is used (text or voice). However, this restriction might result in failures of emotional recognition in virtual assistants, mental health assistance software and in human-computer interactions and wrong emotion recognition can reduce user trust or feelings of being understood. The purpose of this research is to construct a multi-modal emotion recognition (MER) system, which fuses face, voice and text features for the more precise recognition of emotions. It will analyze facial expressions through photographs or video, tonal, pitch, and energy aspects of peopleβs speech, and emotional con- tent of written language (for example, chat messages). Integrating the three categories of cues, the system enhances the detection of five basic emotions: happiness, sadness, anger, fear and the absence of emotion neutrality. The suggested remedy relies on machine learning and deep learning algorithms. Separate processing of each data typeface, voice, and text will be performed initially, and then the respective outcomes will be integrated for the ultimate emotion categoriza- tion. Illustratively, CNNs are applied for facial analysis, RNNs or Transformer models for time-sequence learning from sound, and NLP approaches like word embeddings or Transformer models for emotional inference from text. The features from the three modalities are then eventually fused at either the feature or decision level to improve the prediction accuracy and robustness of the system as compared with single-modality systems. For the project, existing multimodal emotion datasets will be used that contain facial images / videos, the corresponding audio recordings and the matching text labeled with emotions. Programming will be mainly in Python, utilizing OpenCV, Librosa for feature extraction, and TensorFlow, PyTorch for the training of deep learning models. Using Hugging Face Transformers to conduct emotion analysis of text. The systemβs architecture is modular, allowing for a portion (e.g., a facial or audio model) to be replaced or changed without needing to redo the entire system. Therefore, this research seeks to design a more precise, flexible, and stable emotion identification mechanism by fusing several modality inputs and can be used in online tutoring centers, call centers, PC-/mobile-based interactive games, social robots, and mental health applications leading to more natural, human, and empathetic interactions with systems.
π How to Cite
Jalasuthrapu Ravindra Babu, Gangula Nagul Meera, Dala Manas, Guvvala Praveen Kumar, Kanaparthi Vidwan,"Multi-Modal Emotion Recognition System Using Facial Features, Acoustic Patterns, and Textual Sentiment Analysis" International Journal of Scientific Research and Engineering Development, V9(2): Page(980-985) Mar-Apr 2026. ISSN: 2581-7175. www.ijsred.com. Published by Scientific and Academic Research Publishing.
π Other Details
