Cumenic AI — Technical Capabilities

Distributed inference

BanyanStem

Distributed AI Inference on Idle Hardware

Running AI inference entirely through cloud APIs gets expensive fast — especially for high-volume, repetitive tasks like log analysis, classification, and tagging. BanyanStem distributes those tasks across a network of local machines running Ollama, using their idle compute instead of a cloud LLM.

The architecture

A cloud orchestrator (Node.js/Express) receives tasks and pushes them to available worker machines through Cloudflare Tunnels — outbound-only, no port forwarding, no firewall changes on the worker side. Workers execute inference locally via Ollama and report results back. The orchestrator handles task queuing (FIFO, model-matched), worker registration, heartbeat monitoring, stale task detection and re-queuing, and full resilience across worker disconnects and cloud restarts.

The default model — LFM2.5 1.2B

128K context window. Q4_K_M quantisation. Linear recurrence architecture (O(n) memory with context length, not O(n²)). Fast on modest hardware. Per-task inference cost approaches zero once workers are running.

Workers self-bootstrap

On daemon launch — Ollama install check, model pull if missing, tunnel start, cloud registration. Zero manual setup per worker machine.

"The default model runs on modest hardware with a 128K context window. For log analysis, classification, and similar high-volume tasks, the inference cost approaches zero."

Current capabilities

Push-based task dispatch with FIFO queue and model-matched routing
Self-bootstrapping daemons (macOS and Linux)
Stale task watchdog with automatic re-queuing at front of queue
Full resilience: worker disconnect, tunnel drop, cloud restart
Configurable worker hardware reporting (VRAM, max model size)

On the roadmap

Load-aware worker selection using reported VRAM and load average
Config sync from cloud dashboard
Log analysis pipeline — chunking, fan-out across workers, result aggregation
Worker dashboard — queue depth, throughput, per-worker utilisation

App analytics

Logminate

App Analytics Anyone Can Read

Standard app analytics has a built-in knowledge problem: the team that adds the events is usually the only team that can interpret them. Custom event names made sense at the time. Funnels are built after the fact from whatever events happened to get instrumented. If a PM wants to understand what users are doing, they need an analyst or a developer free to translate.

Logminate approaches this differently. Instead of custom events requiring interpretation, it captures natural-language descriptions of what the user is doing in each session. From those, it constructs a plain-English summary of the full session — what happened, in what order, how long was spent where, and what went wrong.

What a product team gets

Session summaries anyone can read — no SQL, no event taxonomy knowledge
Time-spent analysis: which flows are long and high-value vs. long because something is broken
Error flows with reproduction paths — the exact sequence of steps preceding each error, categorised and ready for any dev agent to pick up
Dashboard framed around questions product teams actually ask: "Where are users dropping off?" / "What do power users do that new users don't?"

The inference layer

Logminate uses SLMs for session summarisation — fast to load, low memory footprint, designed for high-volume processing. For teams with the hardware, BanyanStem provides the distributed inference layer, pushing per-session processing costs close to zero at scale.

Common questions

What is BanyanStem?

BanyanStem is a distributed AI inference network that routes high-volume tasks to idle local hardware running Ollama, reducing per-task inference costs to near-zero for workloads that don't require a frontier model.

What is Logminate?

Logminate is an app analytics system that replaces custom event tracking with natural-language session capture. It produces plain-English session summaries, error flows with reproduction paths, and funnel analysis that product teams can read without analyst support.

When does a distributed SLM approach make sense over a frontier model?

When the task is well-defined and repetitive — classification, tagging, log analysis, intent detection — and the volume is high enough that cloud API costs compound. A fine-tuned 1–3B model often outperforms a 70B general model on a specific task, at a fraction of the cost.

What's under
the hood.

Distributed AI Inference on Idle Hardware

The architecture

The default model — LFM2.5 1.2B

Workers self-bootstrap

Current capabilities

On the roadmap

App Analytics Anyone Can Read

What a product team gets

The inference layer

Want to go deeper?

What's underthe hood.

Distributed AI Inference on Idle Hardware

The architecture

The default model — LFM2.5 1.2B

Workers self-bootstrap

Current capabilities

On the roadmap

App Analytics Anyone Can Read

What a product team gets

The inference layer

Want to go deeper?

What's under
the hood.