Technical Work
A closer look at the technical approaches behind our work — distributed AI inference, plain-language app analytics, and the engineering decisions that separate a PoC from a production system.
Running AI inference entirely through cloud APIs gets expensive fast — especially for high-volume, repetitive tasks like log analysis, classification, and tagging. BanyanStem distributes those tasks across a network of local machines running Ollama, using their idle compute instead of a cloud LLM.
A cloud orchestrator (Node.js/Express) receives tasks and pushes them to available worker machines through Cloudflare Tunnels — outbound-only, no port forwarding, no firewall changes on the worker side. Workers execute inference locally via Ollama and report results back. The orchestrator handles task queuing (FIFO, model-matched), worker registration, heartbeat monitoring, stale task detection and re-queuing, and full resilience across worker disconnects and cloud restarts.
128K context window. Q4_K_M quantisation. Linear recurrence architecture (O(n) memory with context length, not O(n²)). Fast on modest hardware. Per-task inference cost approaches zero once workers are running.
On daemon launch — Ollama install check, model pull if missing, tunnel start, cloud registration. Zero manual setup per worker machine.
"The default model runs on modest hardware with a 128K context window. For log analysis, classification, and similar high-volume tasks, the inference cost approaches zero."
Standard app analytics has a built-in knowledge problem: the team that adds the events is usually the only team that can interpret them. Custom event names made sense at the time. Funnels are built after the fact from whatever events happened to get instrumented. If a PM wants to understand what users are doing, they need an analyst or a developer free to translate.
Logminate approaches this differently. Instead of custom events requiring interpretation, it captures natural-language descriptions of what the user is doing in each session. From those, it constructs a plain-English summary of the full session — what happened, in what order, how long was spent where, and what went wrong.
Logminate uses SLMs for session summarisation — fast to load, low memory footprint, designed for high-volume processing. For teams with the hardware, BanyanStem provides the distributed inference layer, pushing per-session processing costs close to zero at scale.
A 30-minute call is usually enough to know if the technical fit is there.
Book a callWhat is BanyanStem?
BanyanStem is a distributed AI inference network that routes high-volume tasks to idle local hardware running Ollama, reducing per-task inference costs to near-zero for workloads that don't require a frontier model.
What is Logminate?
Logminate is an app analytics system that replaces custom event tracking with natural-language session capture. It produces plain-English session summaries, error flows with reproduction paths, and funnel analysis that product teams can read without analyst support.
When does a distributed SLM approach make sense over a frontier model?
When the task is well-defined and repetitive — classification, tagging, log analysis, intent detection — and the volume is high enough that cloud API costs compound. A fine-tuned 1–3B model often outperforms a 70B general model on a specific task, at a fraction of the cost.