What is use-local-llm?

use-local-llm is an npm package that provides React hooks for streaming responses from local LLMs like Ollama, LM Studio, and llama.cpp directly in the browser. No server routes or API layer required. It manages full chat state including message history, streaming, abort, and error handling in a single hook call.

Why not just use Vercel AI SDK for local models?

Vercel AI SDK's React hooks (useChat, useCompletion) require Next.js API routes. They POST to your backend, which then calls the LLM. This architecture makes it impossible to call localhost:11434 directly from the browser. use-local-llm was designed specifically for browser-to-localhost streaming with zero server configuration.

Which local LLM backends does use-local-llm support?

use-local-llm supports Ollama, LM Studio, llama.cpp, and any OpenAI-compatible endpoint. The backend is auto-detected from the port number, or you can set it explicitly. Each backend uses its native streaming protocol (NDJSON for Ollama, SSE for OpenAI-compatible endpoints).

How large is the use-local-llm package?

The entire package is 2.8 KB gzipped with zero runtime dependencies. It only requires React 17 or higher as a peer dependency. It ships as tree-shakeable ESM and CJS bundles.

Can I use use-local-llm outside of React?

Yes. The streamChat and streamGenerate async generators can be used directly without React. They return standard async iterables that yield StreamChunk objects, making them composable with any JavaScript framework or vanilla code.

Local AI in React: Stream from Ollama, LM Studio & llama.cpp—No Server Needed

You've finally got your local LLM running. You pull a model, test it with curl, and it works beautifully. But the moment you try to integrate it into your React app, you hit a wall.

The tools everyone uses assume you're calling OpenAI or Anthropic from a server. They don't expect you to talk to localhost:11434 directly from the browser. And if they do, they force you to build API routes, add a backend, and complicate your prototype.

I kept running into this frustration, so I built use-local-llm, a library with a single purpose. It streams AI responses from local models directly in the browser with no backend, in 2.8 KB of code and zero dependencies.

Why Existing Tools Don't Fit

You'd think you could just use Vercel AI SDK. It's the standard for React + AI. It ships adapters for multiple frameworks, maintains thorough API references, and handles production traffic at scale.

But Vercel did not build it for direct browser-to-localhost communication.

Vercel AI SDK requires an API layer. Your React app POSTs to your Next.js server, which then calls the LLM and streams back. This makes sense for production apps using OpenAI or Anthropic, because you need the backend for authentication, cost tracking, and security.

But when the LLM is already running locally on localhost:11434, that server adds nothing but friction. More code to write. More latency. More complexity for something that should be a quick prototype.

I wanted one thing. A single React hook that talks directly to my local model. No middleman. No API routes. Just React and a running LLM.

Browser → fetch() → localhost:11434 → streaming tokens back

That is the entire architecture. No intermediary.

How It Works

use-local-llm gives you React hooks that stream responses from local LLMs directly in the browser. No API routes. No backend. Just one hook and your running model.

Here's what a complete, production-ready chat interface looks like in practice.

One Hook. That's It.

tsx

import { useOllama } from "use-local-llm";

function Chat() {
  const { messages, send, isStreaming } = useOllama("gemma3:1b");

  return (
    <div>
      {messages.map((m, i) => (
        <p key={i}>
          <strong>{m.role}:</strong> {m.content}
        </p>
      ))}
      <button onClick={() => send("Hello!")} disabled={isStreaming}>
        {isStreaming ? "Generating..." : "Send"}
      </button>
    </div>
  );
}

That's it. That's a complete, streaming chat interface. Message history? Handled. Streaming state? Handled. Stopping mid-generation? Handled. All the complexity is wrapped inside the hook.

No /api/chat route. No Next.js. No backend configuration. Just React doing what it does best.

Works With Any Local LLM

Whether you're using Ollama, LM Studio, llama.cpp, or any OpenAI-compatible server, the same API applies.

tsx

import { useLocalLLM } from "use-local-llm";

const ollama = useLocalLLM({ endpoint: "http://localhost:11434", model: "gemma3:1b" });
const lmStudio = useLocalLLM({ endpoint: "http://localhost:1234", model: "local-model" });
const llamaCpp = useLocalLLM({ endpoint: "http://localhost:8080", model: "model" });

The library detects the backend from the port. Each uses its native streaming protocol, so you get optimal performance regardless of which tool you chose.

Advanced: Token-by-Token Control

Need fine-grained control? You can hook into every token as it arrives.

tsx

import { useStreamCompletion } from "use-local-llm";

function Writer() {
  const { output, generate, isStreaming } = useStreamCompletion({
    endpoint: "http://localhost:11434",
    model: "gemma3:1b",
    onToken: (token) => console.log("Token:", token),
  });

  return (
    <div>
      <pre>{output}</pre>
      <button onClick={() => generate("Write a haiku about React")}>
        Generate
      </button>
    </div>
  );
}

Discover Available Models

Let your users pick from available models without manual configuration:

tsx

import { useModelList } from "use-local-llm";

function ModelPicker() {
  const { models, isLoading } = useModelList();

  return (
    <select>
      {models.map((m) => (
        <option key={m.name} value={m.name}>{m.name}</option>
      ))}
    </select>
  );
}

How It Compares

Vercel AI SDK is fantastic for production apps using cloud APIs. But for local development, the architectures diverge:

	Vercel AI SDK	use-local-llm
For	Cloud LLMs (OpenAI, Anthropic, etc.)	Local LLMs (Ollama, LM Studio, llama.cpp)
Architecture	Client → Server → Cloud API	Client → Local LLM directly
Backend needed	Yes (required)	No
Setup time	10+ minutes	2 minutes
Bundle size	~50 KB+	2.8 KB
Dependencies	Multiple	Zero (React only)
Privacy	Data leaves your machine	Never leaves your machine

Pick the tool for your use case. If you're calling OpenAI in production, use Vercel AI SDK. If you're prototyping locally or prioritizing privacy, use use-local-llm.

Why It's Built This Way

Zero Dependencies (2.8 KB Total)

The entire package is 2.8 KB gzipped. No runtime dependencies. Only React as a peer.

Why does this matter? Your prototype should install instantly and never conflict with anything else in your project. No dependency conflicts. No version mismatches. No bloat.

Works Outside React Too

The core streaming functions (streamChat() and streamGenerate()) are async generators that run in any JavaScript environment, from React and Vue to vanilla JS and Node.js scripts. Use the hooks for React apps or the generators directly for everything else.

typescript

import { streamChat } from "use-local-llm";

const stream = streamChat({
  endpoint: "http://localhost:11434",
  model: "gemma3:1b",
  messages: [{ role: "user", content: "Hello" }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

AbortController Integration

Every stream is cancellable immediately. User-initiated aborts do not trigger error states, because aborting a generation is a normal user action, not an error.

Full TypeScript Support

Everything is strongly typed. IDE autocompletion, type safety, and explicit contracts cover every function.

typescript

import type {
  Backend,        // "ollama" | "lmstudio" | "llamacpp" | "openai-compatible"
  ChatMessage,    // { role: "system" | "user" | "assistant", content: string }
  StreamChunk,    // { content: string, done: boolean, model?: string }
  LocalModel,     // { name, size?, modifiedAt?, digest? }
} from "use-local-llm";

Architecture

┌─────────────────────────────────────────────────┐
│  Your React App                                 │
│                                                 │
│  useOllama("gemma3:1b")                         │
│        │                                        │
│        ▼                                        │
│  useLocalLLM({ endpoint, model, ... })          │
│        │                                        │
│        ▼                                        │
│  streamChat() / streamGenerate()                │
│        │          async generators              │
│        ▼                                        │
│  parseStreamChunk()                             │
│        │          NDJSON + SSE parser            │
│        ▼                                        │
│  fetch() + ReadableStream                       │
└─────────┬───────────────────────────────────────┘
          │ HTTP (no server in between)
          ▼
┌─────────────────────┐
│  Ollama    :11434   │
│  LM Studio :1234    │
│  llama.cpp :8080    │
└─────────────────────┘

Each layer is independently testable. The hooks compose on top of pure functions that can run anywhere.

When to Reach for use-local-llm

Perfect for:

🚀 Rapid prototyping that puts AI streaming in your React app in 2 minutes, not 20
🔒 Privacy-first apps where data never leaves your machine. No cloud API calls. No tracking
🏢 Enterprise/offline deployments that run in air-gapped networks and disconnected environments
🎓 Learning how LLM streaming works without server boilerplate
⚡ Small footprint for projects where bundle size and dependency count matter

Skip this library if:

You're using OpenAI, Anthropic, or other cloud APIs in production
You need server-side authentication, logging, or rate limiting
Your app is already using Vercel AI SDK for other reasons

Start Building

Step 1: Install the library

bash

npm install use-local-llm

Step 2: Start your local LLM

bash

ollama serve

Step 3: Stream AI in your React app (see examples above)

That's it. No API routes. No server configuration. AI streaming in your browser in under 2 minutes.

Learn more:

Full Documentation — API reference, advanced patterns, troubleshooting
Live Demo — See it working with Ollama
GitHub — Source code, issues, contributions
npm Package — Install it

use-local-llm: React Hooks for AI That Actually Work Locally

Why Existing Tools Don't Fit

How It Works

One Hook. That's It.

Works With Any Local LLM

Advanced: Token-by-Token Control

Discover Available Models

How It Compares

Why It's Built This Way

Zero Dependencies (2.8 KB Total)

Works Outside React Too

AbortController Integration

Full TypeScript Support

Architecture

When to Reach for use-local-llm

Start Building

About Pooya Golchian

Newsletter

Why Existing Tools Don't Fit

How It Works

One Hook. That's It.

Works With Any Local LLM

Advanced: Token-by-Token Control

Discover Available Models

How It Compares

Why It's Built This Way

Zero Dependencies (2.8 KB Total)

Works Outside React Too

AbortController Integration

Full TypeScript Support

Architecture

When to Reach for use-local-llm

Start Building

About Pooya Golchian

Get practical AI and engineering playbooks

Newsletter