Skip to main content

Supported Backends

use-local-llm supports four backend types, each with its own API format. The backend is auto-detected from the endpoint URL port, or can be set explicitly.

Ollama

Ollama is the easiest way to run local LLMs on macOS, Linux, and Windows.

Default port11434
Chat APIPOST /api/chat (NDJSON)
Generate APIPOST /api/generate (NDJSON)
Models APIGET /api/tags
# Install
brew install ollama # macOS
curl -fsSL https://ollama.com/install.sh | sh # Linux

# Pull a model
ollama pull gemma3:1b
ollama pull llama3.2
ollama pull mistral

# Start (if not running as a service)
ollama serve
import { useOllama } from "use-local-llm";

// Zero-config — defaults to localhost:11434
const { messages, send } = useOllama("gemma3:1b");

Tested models

ModelParametersSizeStatus
gemma3:1b1B815 MB✅ Tested
llama3.1:8b8B4.9 GB✅ Tested
llama3.23B2.0 GB✅ Tested
mistral7B4.1 GB✅ Tested
deepseek-r18B5.2 GB✅ Tested
qwen2.5-coder:32b32B19.8 GB✅ Tested

LM Studio

LM Studio provides a GUI for managing and running local models with an OpenAI-compatible API.

Default port1234
Chat APIPOST /v1/chat/completions (SSE)
Generate APIPOST /v1/completions (SSE)
Models APIGET /v1/models
  1. Download and install LM Studio
  2. Download a model from the Discover tab
  3. Go to Developer tab → Start server
  4. CORS is enabled by default
import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
endpoint: "http://localhost:1234",
model: "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
});

llama.cpp

llama.cpp is a C++ inference engine with an OpenAI-compatible HTTP server.

Default port8080
Chat APIPOST /v1/chat/completions (SSE)
Generate APIPOST /v1/completions (SSE)
Models APIGET /v1/models
# Build
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && make

# Start server with CORS
./llama-server \
-m models/your-model.gguf \
--host 0.0.0.0 \
--port 8080 \
--cors "*"
import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
endpoint: "http://localhost:8080",
model: "default",
});

OpenAI-Compatible

Any server implementing the OpenAI Chat Completions API works as a fallback.

import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
endpoint: "http://localhost:5000",
backend: "openai-compatible", // explicit since port isn't recognized
model: "my-model",
});

Auto-Detection

The backend is auto-detected from the URL port:

PortBackend
11434ollama
1234lmstudio
8080llamacpp
Otheropenai-compatible

You can always override with the backend option:

const { messages, send } = useLocalLLM({
endpoint: "http://my-server:9000",
backend: "ollama", // force Ollama API format
model: "gemma3:1b",
});