Deep dive · Observability
Grafana Cloud + OpenTelemetry
SolveWatch instruments every layer of the pipeline — from VAD and Whisper decode inside the Python transcriber to AI provider latency and token cost in the Node backend — and ships it all to Grafana Cloud over OTLP, without ever touching the hot answer path.
Architecture: two services, one Grafana destination
Both the Node.js backend (src/utils/telemetry.js) and the Python transcriber (transcriber/telemetry.py) ship an identical OTel stack: a metrics pipeline backed by MeterProvider with a PeriodicExportingMetricReader (10 s batch) and a logs pipeline backed by LoggerProvider with BatchLogRecordProcessor (5 s flush, 10 k bounded queue). Both export over OTLP HTTP to Grafana Cloud.
The previous on-disk NDJSON writers (logs/app.jsonl, logs/memory.jsonl) have been removed. Grafana Cloud is now the single log destination. If telemetry is disabled or the endpoint is unreachable, every call is a no-op — the answer pipeline is never blocked.
Node.js backend metrics
The server instruments every step of both the screenshot and listen flows. Histograms capture latency distributions; counters track throughput and cost.
Python transcriber metrics
The transcriber instruments VAD, Whisper, and speaker identification — the three steps where latency variance most affects the time between when someone finishes speaking and when the AI starts answering.
Host and process gauges (both services)
Both services run a background sampler (10 s interval on a daemon thread / unref'd interval) that pushes system-resource gauges. On Apple Silicon the sampler reports MPS-allocated memory via PyTorch; on NVIDIA it uses pynvml. These gauges include host_name, host_owner, and device_cpu_brand as direct metric labels so Grafana dashboards can filter by machine without relying on resource attribute promotion.
Multi-machine identity
Every metric and log record carries OTel resource attributes that uniquely identify the machine. host.id uses IOPlatformUUID on macOS and /etc/machine-id on Linux — it stays stable across reboots. Both services derive the same host.id for a given machine, so logs from Node and metrics from Python can be correlated in Grafana without a join key.
Logging: from NDJSON files to Loki
Previously, file-logger.js and memory-logger.js wrote structured NDJSON to logs/app.jsonl and logs/memory.jsonl on disk. In this overhaul those writers have been replaced: both modules now delegate to telemetry.logEvent(), which emits an OTel log record to Grafana Loki via the same OTLP HTTP exporter. The call-site API is identical — no existing event emitters changed.
Logs are queued in a bounded BatchLogRecordProcessor (max 10 k records, flush every 5 s). If Grafana is unreachable, the OTel SDK retries with exponential backoff and drops the oldest records when the queue fills — the answer path is never blocked. The same design applies on the Python side (log_writer.py → telemetry.log()).
Configuration
Telemetry is configured in config/api-keys.json under a telemetry key and can be toggled live from the settings page at http://localhost:4000/settings — no restart required. The settings page validates the OTLP endpoint before saving by POSTing an empty metrics payload and checking the response code.