Supported Backends

use-local-llm supports four backend types, each with its own API format. The backend is auto-detected from the endpoint URL port, or can be set explicitly.

Ollama

Ollama is the easiest way to run local LLMs on macOS, Linux, and Windows.


Default port	`11434`
Chat API	`POST /api/chat` (NDJSON)
Generate API	`POST /api/generate` (NDJSON)
Models API	`GET /api/tags`

# Install
brew install ollama   # macOS
curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Pull a model
ollama pull gemma3:1b
ollama pull llama3.2
ollama pull mistral

# Start (if not running as a service)
ollama serve

import { useOllama } from "use-local-llm";

// Zero-config — defaults to localhost:11434
const { messages, send } = useOllama("gemma3:1b");

Tested models

Model	Parameters	Size	Status
`gemma3:1b`	1B	815 MB	✅ Tested
`llama3.1:8b`	8B	4.9 GB	✅ Tested
`llama3.2`	3B	2.0 GB	✅ Tested
`mistral`	7B	4.1 GB	✅ Tested
`deepseek-r1`	8B	5.2 GB	✅ Tested
`qwen2.5-coder:32b`	32B	19.8 GB	✅ Tested

LM Studio

LM Studio provides a GUI for managing and running local models with an OpenAI-compatible API.


Default port	`1234`
Chat API	`POST /v1/chat/completions` (SSE)
Generate API	`POST /v1/completions` (SSE)
Models API	`GET /v1/models`

Download and install LM Studio
Download a model from the Discover tab
Go to Developer tab → Start server
CORS is enabled by default

import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
  endpoint: "http://localhost:1234",
  model: "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
});

llama.cpp

llama.cpp is a C++ inference engine with an OpenAI-compatible HTTP server.


Default port	`8080`
Chat API	`POST /v1/chat/completions` (SSE)
Generate API	`POST /v1/completions` (SSE)
Models API	`GET /v1/models`

# Build
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && make

# Start server with CORS
./llama-server \
  -m models/your-model.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --cors "*"

import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
  endpoint: "http://localhost:8080",
  model: "default",
});

OpenAI-Compatible

Any server implementing the OpenAI Chat Completions API works as a fallback.

import { useLocalLLM } from "use-local-llm";

const { messages, send } = useLocalLLM({
  endpoint: "http://localhost:5000",
  backend: "openai-compatible", // explicit since port isn't recognized
  model: "my-model",
});

Auto-Detection

The backend is auto-detected from the URL port:

Port	Backend
`11434`	`ollama`
`1234`	`lmstudio`
`8080`	`llamacpp`
Other	`openai-compatible`

You can always override with the backend option:

const { messages, send } = useLocalLLM({
  endpoint: "http://my-server:9000",
  backend: "ollama", // force Ollama API format
  model: "gemma3:1b",
});

Ollama​

Tested models​

LM Studio​

llama.cpp​

OpenAI-Compatible​

Auto-Detection​