![]() |
International Journal of Scientific Research and Engineering Development( International Peer Reviewed Open Access Journal ) ISSN [ Online ] : 2581 - 7175 |
IJSRED » Archives » Volume 8 -Issue 5

📑 Paper Information
📑 Paper Title | Gesture talk: an Integrated Multimodal AI Assistant (Gesture, Voice, and Conversational Intelligence) |
👤 Authors | Pawan Gandhi, Krishna Masharu |
📘 Published Issue | Volume 8 Issue 5 |
📅 Year of Publication | 2025 |
🆔 Unique Identification Number | IJSRED-V8I5P181 |
📝 Abstract
Emerging trends in Human‑Computer Interaction (HCI) emphasize multimodal input systems that combine visual gestures, voice commands, and dialogue-based AI. This work presents a Python‑based assistant integrating MediaPipe/OpenCV, voice automation through SpeechRecognition and system subprocesses, and a Generative AI chatbot powered by Google’s Gemini API. Inspired by prior multimodal studies and systems combining speech and gestures, our system enables real‑time control of volume, brightness, media, applications, files, and AI chat—all running concurrently using multithreading for responsiveness. Evaluation demonstrates high accuracy and low latency, showing promise for intuitive, accessible multimodal interfaces.