Live Khutbah Displayer

FastAPIGemini AIWebSocketEmbeddingsRaspberry Pi

Real-time multilingual khutbah display system optimized for edge devices with live transcription and embedding matching

Featured image for Live Khutbah Displayer

Overview

A real-time khutbah display system that captures imam's voice, transcribes it using Munsit API, and matches it with pre-embedded Arabic, Urdu, and English text pairs. Built for edge devices like Raspberry Pi with optimized local embeddings and WebSocket-based lightweight interface.

The Problem

Mosques needed a system to display khutbah translations in real-time for multilingual congregations, but existing solutions were too resource-intensive for budget-friendly edge devices and couldn't handle live voice processing with proper noise removal.

Technical Solution

System Architecture

  • Gemini AI for initial khutbah file alignment (Arabic, Urdu, English)
  • Optimized local embedding model for resource-efficient matching
  • Custom noise removal and voice enhancement pipeline
  • FastAPI backend with WebSocket for real-time display
  • Munsit API integration for live transcription

Backend

FastAPI with WebSocket

Transcription

Munsit API

Embeddings

Optimized local model

Target Hardware

Raspberry Pi compatible

Live Khutbah Displayer architecture diagram

Results & Impact

Near-live

Display latency

3

Simultaneous language display

Low-end CPU

Hardware compatibility

Lessons Learned

Resource Optimization Critical

Local embedding models required careful optimization to run on Raspberry Pi hardware constraints

Audio Processing Challenges

Custom noise removal was essential for accurate transcription in mosque environments

WebSocket Efficiency

Lightweight WebSocket implementation enabled real-time updates without overwhelming edge devices

© 2025 Muhammad Saad. All rights reserved.