Deep dive · Architecture
The full audio-to-answer pipeline
Three services — a Python transcriber, a Node.js backend, and an Electron HUD — work together to go from raw microphone audio to streaming AI answers in under a second.
Service 1: Python Transcriber
Microphone capture and VAD
audio_recorder.py captures raw PCM at 16 kHz from the default input device (or the device configured in settings). Before any transcription, a Voice Activity Detection model gates the audio — silence, background noise, and non-speech frames are dropped to avoid feeding garbage to Whisper.
Rolling buffer and 300 ms decode loop
Speech frames are appended to a rolling deque buffer (up to ~30 seconds). Every 300 ms, StreamingSTT hands the current buffer to Whisper and receives a word-level transcript with timestamps.
LocalAgreement-2: commit before silence
Each successive decode is compared to the previous. A word at position i is committed when it matches across two consecutive decodes. Committed words are forwarded to Node as stt_partial.committed without waiting for silence. Tentative (not yet agreed) words follow as stt_partial.tentative.
When 700 ms of silence is detected — or the user presses Cmd+Shift+X — the remaining buffer is force-decoded and emitted as stt_final, triggering the AI answer.
Service 2: Node.js Backend
Receive stt_final and build prompt
dataHandler.js receives the stt_final event. It assembles the prompt by combining the current question with conversation memory from InterviewTranscriptBuffer — the last 3–5 Q&A pairs plus compressed summaries of older context (~850 tokens max overhead).
Multi-provider AI with streaming
ai.service.js tries the configured provider cascade (default: Groq → Gemini → OpenAI → Claude). The first healthy provider is called with stream: true. Each token is immediately forwarded as a question_answer_token Socket.IO event to the HUD — no buffering.
If a provider errors or rate-limits, it is marked as cooling off and the next one in the cascade is tried. Config hot-reloads on every fs.watch event on api-keys.json — no restart needed to change providers or keys.
Session memory update
After question_answer_complete, the Q&A pair is stored in InterviewTranscriptBuffer. When 5 pairs accumulate, an async Ollama call compresses them into a ~150-token summary. This happens fire-and-forget — it never delays the next answer.
Service 3: Electron HUD
Always-on-top, content-protected overlay
The HUD is a frameless Electron window (380×460 px) pinned above all other windows at screen-saver level. setContentProtection(true) makes it invisible to every software-based screen capture tool. Socket.IO connects over ws://localhost:4000.
Live strip + streaming answer render
stt_partial events update the live strip: committed words render bright, tentative words render dim/italic. When question_answer_token events arrive, they are appended to the answer card in real time. The card is visible and updating while the model is still generating.
Flow 2: Screenshot analysis
A second, parallel flow handles screenshot-based questions. screenshot-monitor.service.js polls the uploads/ directory. New images are preprocessed with Sharp (contrast enhancement, grayscale) then passed to Tesseract for OCR. The extracted text is treated as the question and sent through the same AI pipeline — same provider cascade, same streaming HUD render.