The local AI agent landscape added a significant player in 2026. Hermes Agent from Nous Research launched with a self-improving architecture that separates it from interactive assistants and pure automation tools. It maintains cross-session memory, builds accumulated skills, and runs entirely local through Ollama.
I spent two weeks running Hermes daily across coding tasks, research workflows, and infrastructure automation. Here is what the agent actually does, how to set it up, and which model configurations work best.
Subscribe to the newsletter for monthly AI agent benchmarks and local LLM infrastructure guides.
What Makes Hermes Different
Most AI agents are stateless. Start a new session and they have no memory of previous interactions. Claude Code works this way. OpenAI's Operator works this way. The agent performs well within a session but does not build on prior work.
Hermes maintains cross-session memory. It remembers your preferences, accumulated skills, and the context of prior projects. Over time, the agent becomes calibrated to your workflow rather than starting from a blank state every conversation.
The second differentiator is the skill system. Hermes ships with 70+ built-in skills covering research, coding, data analysis, and productivity tasks. More importantly, it can create new skills dynamically based on your workflows. If you run a repetitive multi-step process, Hermes can abstract it into a reusable skill that persists across sessions.
The architecture runs entirely through Ollama, which means you own the inference stack completely. No API calls leave your machine unless you explicitly choose a cloud model.
Installation: One Command
If you already have Ollama installed, Hermes setup takes under 5 minutes.
ollama launch hermesOllama handles the entire installation sequence:
- Installs Hermes if not already present (via the Nous Research install script)
- Prompts you to select a model (local via Ollama or cloud models)
- Configures the Ollama provider automatically at
http://127.0.0.1:11434/v1 - Runs the setup wizard for optional messaging gateway connections
The installer checks for dependencies (Python 3.11+, Node.js, ripgrep, ffmpeg) and installs any missing components via uv, the fast Python package manager. The entire dependency chain resolved in under 4 minutes on my M4 Max MacBook Pro.
Model Selection
Local Models (Recommended)
| Model | Parameters | VRAM | Strengths |
|---|---|---|---|
| Gemma4 | 7B-27B | 8-16 GB | Reasoning, code generation |
| Qwen3.6 | 32B-72B | 24-48 GB | Coding, visual understanding, agentic tool use |
For Apple Silicon, Qwen3.6 32B runs at acceptable speeds on M4 Max with 64 GB unified memory. Gemma4 7B is snappy on any recent Mac with 16+ GB RAM.
Cloud Models via Ollama Gateway
If you want the best model capability without local hardware constraints, Ollama's cloud gateway connects models that are not designed for consumer hardware:
| Model | Provider | Strengths |
|---|---|---|
| kimi-k2.5 | Minimax | Multimodal reasoning with subagents |
| glm-5.1 | Zhipu | Reasoning and code generation |
| qwen3.5 | Qwen | Reasoning, coding, agentic tool use with vision |
| minimax-m2.7 | Minimax | Fast efficient coding and real-world productivity |
The gateway approach gives you access to larger models without the hardware requirement. The tradeoff is inference latency and API dependency.
Messaging Gateway: Access Hermes From Anywhere
Hermes includes a messaging gateway that connects Telegram, Discord, Slack, WhatsApp, Signal, and Email. After installation, run:
hermes gateway setupThis launches an interactive wizard that walks through platform authentication. Once connected, you can message your Hermes agent from any configured platform and receive responses powered by your local Ollama inference.
The practical use case: running Hermes on a always-on machine at home, then querying it via Telegram while traveling. The agent retains its cross-session memory regardless of which platform triggers it.
To reconfigure at any time:
hermes setupDaily Usage: What Actually Works
Coding Workflows
Hermes handles boilerplate generation, refactoring tasks, and documentation writing competently with Qwen3.6 32B. The agent writes code to files, executes shell commands, and maintains context across files within a session.
The skill system shines for repetitive workflows. I taught Hermes a skill for the standard PR review checklist I run on every pull request. Now a single prompt triggers the entire sequence: fetch the PR diff, run linters, check test coverage, and format the review summary.
Research Tasks
The built-in research skills work well for technical literature review. Hermes navigates documentation, synthesizes information across sources, and produces structured output. For financial or market research, connecting a cloud model through the gateway improves output quality on complex synthesis tasks.
Infrastructure Automation
Hermes manages shell scripting, cron job setup, and Docker container orchestration through natural language commands. The cross-session memory means it learns your infrastructure preferences over time.
Comparison with Claude Code
| Dimension | Hermes + Ollama | Claude Code |
|---|---|---|
| Memory | Cross-session persistent | Session-only |
| Skills | 70+ built-in, user-creatable | Prompt-based |
| Local inference | Native, full control | Cloud API required |
| Messaging gateway | Built-in | Not supported |
| Self-improvement | Learns from your usage | Static capabilities |
| Best for | Autonomous agents, always-on tasks | Interactive coding assistance |
Claude Code remains the stronger interactive development partner for active coding sessions. Hermes is the better choice for autonomous workflows, cross-session memory, and accessing AI capability from messaging platforms.
Limitations
Hermes is not a replacement for focused coding assistance during active development. The agent shines for background tasks and autonomous operation, but the inference latency on local models makes it less ideal for real-time pair programming.
The skill creation system, while powerful, requires upfront investment to teach the agent your specific workflows. Teams with standardized processes benefit most.
Browser tool support requires Playwright Chromium dependencies on macOS, which must be installed manually:
cd /Users/pooya/.hermes/hermes-agent && npx playwright install chromiumWithout this step, browser-based skills do not function.
Verdict
Hermes Agent with Ollama fills a specific niche that other tools leave empty. If you want an always-on AI agent that learns your preferences, operates entirely on your hardware, and can be accessed via messaging apps, this configuration is unmatched in 2026.
For teams evaluating AI agent infrastructure, Hermes deserves serious evaluation alongside LangGraph, CrewAI, and Smolagents. The self-improving architecture and messaging gateway create workflows that stateless agents cannot replicate.
