Technical Work

What's under
the hood.

A closer look at the technical approaches behind our work — distributed AI inference, plain-language app analytics, and the engineering decisions that separate a PoC from a production system.

Distributed AI Inference on Idle Hardware

Running AI inference entirely through cloud APIs gets expensive fast — especially for high-volume, repetitive tasks like log analysis, classification, and tagging. BanyanStem distributes those tasks across a network of local machines running Ollama, using their idle compute instead of a cloud LLM.

The architecture

A cloud orchestrator (Node.js/Express) receives tasks and pushes them to available worker machines through Cloudflare Tunnels — outbound-only, no port forwarding, no firewall changes on the worker side. Workers execute inference locally via Ollama and report results back. The orchestrator handles task queuing (FIFO, model-matched), worker registration, heartbeat monitoring, stale task detection and re-queuing, and full resilience across worker disconnects and cloud restarts.

The default model — LFM2.5 1.2B

128K context window. Q4_K_M quantisation. Linear recurrence architecture (O(n) memory with context length, not O(n²)). Fast on modest hardware. Per-task inference cost approaches zero once workers are running.

Workers self-bootstrap

On daemon launch — Ollama install check, model pull if missing, tunnel start, cloud registration. Zero manual setup per worker machine.

"The default model runs on modest hardware with a 128K context window. For log analysis, classification, and similar high-volume tasks, the inference cost approaches zero."

Current capabilities

  • Push-based task dispatch with FIFO queue and model-matched routing
  • Self-bootstrapping daemons (macOS and Linux)
  • Stale task watchdog with automatic re-queuing at front of queue
  • Full resilience: worker disconnect, tunnel drop, cloud restart
  • Configurable worker hardware reporting (VRAM, max model size)

On the roadmap

  • Load-aware worker selection using reported VRAM and load average
  • Config sync from cloud dashboard
  • Log analysis pipeline — chunking, fan-out across workers, result aggregation
  • Worker dashboard — queue depth, throughput, per-worker utilisation

App Analytics Anyone Can Read

Standard app analytics has a built-in knowledge problem: the team that adds the events is usually the only team that can interpret them. Custom event names made sense at the time. Funnels are built after the fact from whatever events happened to get instrumented. If a PM wants to understand what users are doing, they need an analyst or a developer free to translate.

Logminate approaches this differently. Instead of custom events requiring interpretation, it captures natural-language descriptions of what the user is doing in each session. From those, it constructs a plain-English summary of the full session — what happened, in what order, how long was spent where, and what went wrong.

What a product team gets

  • Session summaries anyone can read — no SQL, no event taxonomy knowledge
  • Time-spent analysis: which flows are long and high-value vs. long because something is broken
  • Error flows with reproduction paths — the exact sequence of steps preceding each error, categorised and ready for any dev agent to pick up
  • Dashboard framed around questions product teams actually ask: "Where are users dropping off?" / "What do power users do that new users don't?"

The inference layer

Logminate uses SLMs for session summarisation — fast to load, low memory footprint, designed for high-volume processing. For teams with the hardware, BanyanStem provides the distributed inference layer, pushing per-session processing costs close to zero at scale.

Want to go deeper?

A 30-minute call is usually enough to know if the technical fit is there.

Book a call

What is BanyanStem?

BanyanStem is a distributed AI inference network that routes high-volume tasks to idle local hardware running Ollama, reducing per-task inference costs to near-zero for workloads that don't require a frontier model.

What is Logminate?

Logminate is an app analytics system that replaces custom event tracking with natural-language session capture. It produces plain-English session summaries, error flows with reproduction paths, and funnel analysis that product teams can read without analyst support.

When does a distributed SLM approach make sense over a frontier model?

When the task is well-defined and repetitive — classification, tagging, log analysis, intent detection — and the volume is high enough that cloud API costs compound. A fine-tuned 1–3B model often outperforms a 70B general model on a specific task, at a fraction of the cost.