
Overview
A real-time khutbah display system that captures imam's voice, transcribes it using Munsit API, and matches it with pre-embedded Arabic, Urdu, and English text pairs. Built for edge devices like Raspberry Pi with optimized local embeddings and WebSocket-based lightweight interface.
The Problem
Mosques needed a system to display khutbah translations in real-time for multilingual congregations, but existing solutions were too resource-intensive for budget-friendly edge devices and couldn't handle live voice processing with proper noise removal.
Technical Solution
System Architecture
- •Gemini AI for initial khutbah file alignment (Arabic, Urdu, English)
- •Optimized local embedding model for resource-efficient matching
- •Custom noise removal and voice enhancement pipeline
- •FastAPI backend with WebSocket for real-time display
- •Munsit API integration for live transcription
Backend
FastAPI with WebSocket
Transcription
Munsit API
Embeddings
Optimized local model
Target Hardware
Raspberry Pi compatible

Results & Impact
Near-live
Display latency
3
Simultaneous language display
Low-end CPU
Hardware compatibility
Lessons Learned
Resource Optimization Critical
Local embedding models required careful optimization to run on Raspberry Pi hardware constraints
Audio Processing Challenges
Custom noise removal was essential for accurate transcription in mosque environments
WebSocket Efficiency
Lightweight WebSocket implementation enabled real-time updates without overwhelming edge devices