lloyal.node

Native backend for the lloyal inference platform.

Prebuilt llama.cpp binaries for 13 platform/GPU combinations, exposing a SessionContext that powers the @lloyal-labs/sdk inference primitives (Branch, BranchStore, Session, Rerank) and @lloyal-labs/lloyal-agents multi-agent framework. Built on liblloyal, a header-only C++20 inference kernel for llama.cpp.

All SDK and agent exports are re-exported from this package for convenience — import { Branch, useAgent, agentPool } from "@lloyal-labs/lloyal.node" works out of the box.

Install

npm install @lloyal-labs/lloyal.node

Prebuilt binaries for 13 platform/GPU combinations. GPU selection at runtime, not install time.

Platform	Arch	Acceleration
macOS	arm64	Metal
macOS	x64	CPU
Linux	x64	CPU / CUDA / Vulkan
Linux	arm64	CPU / CUDA / Vulkan
Windows	x64	CPU / CUDA / Vulkan
Windows	arm64	CPU / Vulkan

Quick Start

import { createContext } from "@lloyal-labs/lloyal.node";
import { Branch, BranchStore } from "@lloyal-labs/sdk";

const ctx = await createContext({ modelPath: "./model.gguf", nSeqMax: 4 });
const store = new BranchStore(ctx);

const root = Branch.create(ctx, 0, { temperature: 0.8 });
await root.prefill(await ctx.tokenize("Explain quantum entanglement"));

// Fork and generate — all branches in lockstep, 1 GPU call per step
const branches = await Promise.all([root.fork(), root.fork(), root.fork()]);
for (;;) {
  const live = branches.filter((b) => !b.disposed);
  if (!live.length) break;
  const produced = live.map((b) => ({ b, ...b.produce() }));
  for (const p of produced.filter((p) => p.isStop)) await p.b.prune();
  const items = produced
    .filter((p) => !p.isStop)
    .map((p) => {
      p.b.accept(p.token);
      return [p.b, p.token];
    });
  await store.commit(items);
}

Or for single-branch generation, Branch is an async iterable:

for await (const { token, text } of branch) {
  process.stdout.write(text);
}

See @lloyal-labs/sdk for the full Branch API, continuous tree batching, KV tenancy, and topology documentation.

Without the SDK

createContext returns a SessionContext — the native interface to llama.cpp. You can use it directly without the SDK's Branch/BranchStore layer:

import { createContext } from "@lloyal-labs/lloyal.node";

const ctx = await createContext({ modelPath: "./model.gguf", nSeqMax: 4 });

// Chat templates — model-agnostic formatting + tool calling
const { prompt, grammar, format } = await ctx.formatChat(messages, {
  addGenerationPrompt: true,
  tools: [{ type: "function", function: { name: "search", parameters: schema } }],
});
const { content, toolCalls } = await ctx.parseChatOutput(output, format);

// Branch primitives — what the SDK's Branch class wraps
const handle = ctx._branchCreate(0, samplerParams);
await ctx._branchPrefill(handle, tokens);
const token = ctx._branchSample(handle);
const text = ctx.tokenToText(token);
const isStop = ctx.isStopToken(token);
ctx._branchAccept(handle, token);
const logits = ctx._branchGetLogits(handle);     // Float32Array(vocabSize)
const entropy = ctx._branchModelEntropy(handle);
const child = ctx._branchFork(handle);

// Store primitives — what the SDK's BranchStore wraps
await ctx._storeCommit([handle1, handle2], [tok1, tok2]);  // N branches, 1 GPU call
await ctx._storePrefill([handle], [tokens]);
await ctx._storeRetainOnly(winner);
const available = ctx._storeAvailable();

// KV cache — snapshot, copy, persist
await ctx.kvSeqCopy(0, 1);                      // share prefix across sequences
await ctx.kvCacheSave();                         // snapshot for rollback
await ctx.kvCacheLoad();                         // restore checkpoint
await ctx.kvCacheWriteFile("cache.bin");         // persist to disk

// Embeddings
const embeddings = await ctx.encode("query text");
const dim = ctx.getEmbeddingDimension();

// Grammar + tokenizer
const grammar = await ctx.jsonSchemaToGrammar(schema);
const tokens = await ctx.tokenize("Hello world");
const sep = await ctx.getTurnSeparator();

What This Package Provides

Native-only (not in SDK):

createContext(options) — load a GGUF model, return a SessionContext
loadBinary(options?) — explicit GPU variant selection with automatic fallback
Prebuilt binaries for 13 platform/GPU combinations

Re-exported from @lloyal-labs/sdk:

Branch, BranchStore, Session, Rerank
Per-token metrics: modelEntropy(), modelSurprisal(), samplingPerplexity
Chat formatting: formatChat(), parseChatOutput()
Grammar: jsonSchemaToGrammar(), setGrammar()

Re-exported from @lloyal-labs/lloyal-agents:

useAgent, agentPool, useAgentPool, withSpine, diverge, reduce, createToolkit
Structured concurrency DAG via Effection generators
In-loop orchestration: agents as branches of a single running process
App protocol surfaces (AppRegistryCtx, AppConfigStoreCtx, App, AppManifest) when paired with @lloyal-labs/rig's defineApp / createAppRegistry

GPU Variant Selection

import { loadBinary, createContext } from "@lloyal-labs/lloyal.node";

// Automatic — uses Metal on macOS, CPU elsewhere
const ctx = await createContext({ modelPath: "./model.gguf" });

// Explicit CUDA
const binding = loadBinary({ gpuVariant: "cuda" });
const ctx = await binding.createContext({ modelPath: "./model.gguf" });
// Falls back to CPU with a warning if CUDA runtime not available

Examples

Example	Pattern
`entropy/`	`modelEntropy()` mid-generation as control signal
`chat/`	Interactive streaming chat
`embed/`	Text embeddings extraction

npx tsx examples/best-of-n/best-of-n.ts
npx tsx examples/chat/chat.ts ./model.gguf

CI Testing

Integration tests run real inference across architectures:

Architecture	Test Model	Template
Llama	Llama 3.2 1B	llama3
Phi	Phi 3.5 Mini	phi3
Qwen	Qwen 3 1.7B	chatml
Gemma	Gemma 3 1B	gemma
SmolLM	SmolLM2 1.7B	chatml
Ministral	Ministral 3B	mistral

See distribution.md for details.

Ecosystem

Package	Description
`@lloyal-labs/sdk`	Backend-agnostic inference primitives (Branch, BranchStore, Session, Rerank)
`@lloyal-labs/lloyal-agents`	Multi-agent runtime + App protocol primitives
`@lloyal-labs/rig`	App protocol helpers, retrieval providers, framework tools (Plan/Delegate/Report)
`harness.dev`	CLI — scaffold harnesses + Apps; publish/install signed Apps via the channel
liblloyal	Header-only C++20 inference kernel for llama.cpp
lloyal.node	This package — native backend + prebuilt binaries
nitro-llama	React Native backend via Nitro Modules
tsampler	Reference sampler implementation

Contributing

See CONTRIBUTING.md for development setup and release process.

License

You can build and sell commercial products using lloyal.node.

lloyal.node 3.0 is source-available under FSL-1.1-Apache-2.0 and converts to Apache 2.0 two years after each release. The restriction is narrow: you cannot offer a competing HDK runtime, managed HDK service, or alternative HDK App distribution channel.

See LICENSE-FAQ.md for concrete examples of what's permitted and what's restricted. See LICENSE for the legal text and NOTICE for attribution including the bundled llama.cpp MIT dependency.

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.github		.github
cmake		cmake
docs		docs
examples		examples
liblloyal @ a3558a0		liblloyal @ a3558a0
llama.cpp @ d6d0ce8		llama.cpp @ d6d0ce8
packages/template		packages/template
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.npmignore		.npmignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
GRANT.md		GRANT.md
LICENSE		LICENSE
LICENSE-FAQ.md		LICENSE-FAQ.md
NOTICE		NOTICE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lloyal.node

Install

Quick Start

Without the SDK

What This Package Provides

GPU Variant Selection

Examples

CI Testing

Ecosystem

Contributing

License

About

Licenses found

Uh oh!

Releases 15

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lloyal.node

Install

Quick Start

Without the SDK

What This Package Provides

GPU Variant Selection

Examples

CI Testing

Ecosystem

Contributing

License

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages