How to Give Claude Code Persistent Memory

How to Give Claude Code Persistent Memory
How to Give Claude Code Persistent Memory

Claude Code doesn’t remember anything between sessions. Every time you start a new session, it’s a blank slate. You can stuff context into CLAUDE.md files, but that’s static text, not searchable memory. It doesn’t learn your preferences, remember your decisions, or recall the research you did yesterday.

The fix is an MCP memory server. I’ve been running one for about a week now and it’s changed how I work with Claude Code. Here’s how to set it up.

The Memory Server

I’m using mcp-memory-service, an open source MCP memory server built by the Doobidoo community. It’s a solid piece of work. The project provides semantic search over stored memories using sqlite-vec for local vector storage, a Cloudflare hybrid backend for cloud sync, and a clean set of MCP tools: memory_store, memory_search, memory_list, memory_delete, and others. It’s at v10.13.1 as of this writing and actively maintained.

I submitted a PR upstream adding SSE transport support (more on why below) and it’s been merged. Clone the repo and install:

git clone https://github.com/doobidoo/mcp-memory-service.git ~/Projects/mcp-memory-service
cd ~/Projects/mcp-memory-service
uv sync

The Problem With stdio

The default way to run an MCP server is stdio transport. Claude Code spawns the server as a child process and communicates over stdin/stdout. This works, but for a memory server it has real problems.

The embedding model has to load every time you start a new session. If you have a SessionStart hook that tries to search memory, it fires before the server finishes initializing. The hook fails, your session starts cold, and the whole point of persistent memory is defeated.

Beyond the race condition, you’re running multiple copies of the same server if you have multiple terminals open. Each one loads the same embedding model into RAM.

SSE Transport and systemd

The fix is to run the memory server as a long-lived systemd user service using SSE transport instead of stdio. SSE (Server-Sent Events) runs the MCP server as an HTTP service that clients connect to over the network, rather than stdio where the client spawns the server as a child process and talks over stdin/stdout. One process, always running, already warm when Claude Code connects.

Start it manually to test:

uv --directory ~/Projects/mcp-memory-service run memory server --sse

It starts uvicorn on 127.0.0.1:8765. Verify it works:

curl -s http://127.0.0.1:8765/sse
# Should return: event: endpoint
# data: /messages/?session_id=...

Create a systemd user service at ~/.config/systemd/user/mcp-memory.service:

[Unit]
Description=MCP Memory Service (SSE Transport)
After=network.target

[Service]
Type=simple
ExecStart=/path/to/uv --directory /path/to/mcp-memory-service run memory server --sse
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=mcp-memory

[Install]
WantedBy=default.target

Enable and start it:

systemctl --user daemon-reload
systemctl --user enable --now mcp-memory.service
loginctl enable-linger $USER  # survives logout

Update ~/.claude.json to connect via SSE:

{
  "mcpServers": {
    "memory": {
      "type": "sse",
      "url": "http://127.0.0.1:8765/sse"
    }
  }
}

Three lines of config instead of a command block with args and environment variables. The server is already running when Claude Code starts, so memory searches work instantly, including from hooks.

CLAUDE.md: Static Context and Memory Instructions

Claude Code reads ~/.claude/CLAUDE.md at the start of every session. This is where both your static context and your memory instructions live.

Cloud Sync With Cloudflare

The memory service supports a hybrid backend that keeps a local sqlite-vec database for fast reads and syncs to Cloudflare D1 and Vectorize in the background. Local reads stay around 5ms. The cloud copy gives you cross-device sync and backup without sacrificing local performance.

Set MCP_MEMORY_STORAGE_BACKEND=hybrid in your systemd service or environment and configure your Cloudflare credentials. You’ll need a D1 database and a Vectorize index (768 dimensions, cosine metric). Add the environment variables to your systemd service file:

Environment="MCP_MEMORY_STORAGE_BACKEND=hybrid"
Environment="CLOUDFLARE_API_TOKEN=your-token"
Environment="CLOUDFLARE_ACCOUNT_ID=your-account-id"
Environment="CLOUDFLARE_D1_DATABASE_ID=your-d1-id"
Environment="CLOUDFLARE_VECTORIZE_INDEX=your-index-name"

The memory service handles the sync automatically. Memories are written locally first, then queued for background sync to Cloudflare. If you lose internet, it keeps working locally and catches up when connectivity returns.

What to Store

The memory server is most useful when you store things Claude needs across sessions: your project conventions, architectural decisions, research summaries, brand guidelines, people’s names and roles, terminology preferences. I have about 50 memories stored now, covering everything from RHEL development standards to blog thumbnail specifications, even my personal philosophy: Secular Incrementalism.

The Cold Start Problem

Here’s what I learned the hard way: memory search instructions in CLAUDE.md are not reliably followed.

You can put “search memory for the user’s context before responding” in your CLAUDE.md. It’s a good instruction. Claude Code reads CLAUDE.md at the start of every session. But the model doesn’t always follow action instructions. Sometimes it just starts answering your question without searching memory first. It forgets your preferences, forgets the research you did yesterday. The whole point of persistent memory is defeated when the model skips the one instruction that loads it.

This is the cold start problem. The memory server is running, the data is there, but nothing forces the model to go look at it.

The Fix: Static Context With Breadcrumbs

The solution is putting two things in CLAUDE.md — not just instructions, but actual context.

Static context the model always sees. CLAUDE.md content isn’t instructions the model might skip — it’s context the model always has. If you put your identity, your name, your role, and your key preferences directly in CLAUDE.md as plain text, the model sees it every session. No action required. No memory search needed. It’s just there.

Here’s what the top of my CLAUDE.md looks like:

## Identity and User Context
- You are Josui, named after Kuroda Kanbei's Buddhist name -- "like water"
- Opinionated advisor, not assistant
- Direct, concise, practical

### User: Scott McCarty (@fatherlinux)
- Senior Principal Product Manager at Red Hat
- Lead PM for RHEL 10, responsible for RHEL 11
- Runs two brands: Crunchtools and Educated Confusion
- See MCP memory tagged [josui, persona] for full details when needed
- See MCP memory tagged [scott-mccarty] for full context when needed

The breadcrumb trick. Notice the last two lines: “See MCP memory tagged [josui, persona] for full details when needed.” These are breadcrumbs. They tell the model that richer context exists in memory and where to find it. Even when the model skips the explicit “search memory at session start” instruction, it often follows these embedded references when the conversation needs deeper context. The static text plants the idea that memory exists and is searchable, and the model acts on it.

Memory search instructions for the dynamic layer. Below the static context, I have a block of instructions telling Claude when and how to use memory. Here’s what mine looks like:

## Memory-First Workflow -- MANDATORY
You have a persistent memory MCP server. Memory is your long-term brain. Use it aggressively.

### Session Start (ALWAYS do this FIRST, before any other work)
1. Run `memory_search` with a broad query related to the user's first message or task
2. Run a second `memory_search` by relevant tags (e.g., `tags: ["rhel"]`, `tags: ["product-management"]`)
3. If a skill is invoked, search memory for that skill's domain

### During the Session (check memory BEFORE making assumptions)
- **Person mentioned by name** → `memory_search` for that person before responding
- **Project or product referenced** → `memory_search` for it before making claims
- **User corrects you or states a preference** → store it immediately with `memory_store`
- **Research completed** → store key findings with `memory_store` before session ends
- **Decision made** → store the decision and rationale with `memory_store`

### Session End
- Store new learnings, decisions, corrections, and research findings as memories
- Use descriptive tags so future sessions can find them

When the model follows these instructions, it pulls in rich context — full project histories, brand guidelines, research findings, the whole graph.

When it doesn’t follow them, the static context and breadcrumbs have already established enough of a floor that the session isn’t starting cold.

The distinction matters: static context in CLAUDE.md is always seen. Action instructions in CLAUDE.md are sometimes skipped. Put both in the same file, and you get a reliable floor with an aspirational ceiling. Most sessions hit the ceiling. All sessions hit the floor.

The Payoff

The difference is immediate. Claude Code stops asking me to explain things I’ve already told it. It remembers that “Jira” means Red Hat’s Jira and “ticket” means my RT instance. It knows my blog’s visual style without being told. It picks up where the last session left off.

Persistent memory turns Claude Code from a tool you configure every session into one that accumulates context over time. The mcp-memory-service project from the Doobidoo community makes it straightforward to set up, SSE transport makes it fast, and static context with breadcrumbs in CLAUDE.md makes the cold start reliable. That’s worth the setup.

Leave a Reply

Your email address will not be published. Required fields are marked *