Boosting LLM Productivity: Open-Sourcing a Persistent Memory System for Claude Code

The "Stranger Every Morning" Problem

CLI-based agentic tools like Claude Code are brilliant, but they have a classic LLM flaw: ephemerality. Every time you close your terminal, the agent "forgets" your architectural decisions, your specific variable naming preferences, and the half-finished refactor you were working on yesterday.

At Stacklyn Labs, we’ve been exploring the Model Context Protocol (MCP) to bridge this gap. By giving Claude Code a persistent memory layer, we transform it from a search tool into a teammate that evolves with your project.

Defensive Memory: Handling Semantic Drift and Duplication

In a large codebase, "Memory" can quickly become noise. If the agent saves every microscopic thought, the search retrieval will eventually experience Semantic Drift where a query for "auth logic" returns three conflicting refactor plans from three different sessions.

To prevent this, our MCP implementation includes a "Deduplication and Conflict Resolution" step. Before a new memory is saved, the server performs a semantic similarity check. If a similar key exists, it prompts the agent to "merge" or "supersede" the old entry, keeping the context clean and authoritative.

// Node.js: Semantic Deduplication Tool for MCP Memory
const miniSearch = require('minisearch'); // Lightweight local indexing
const index = new miniSearch({ fields: ['key', 'content'], storeFields: ['content'] });

async function saveWithResolution(key, content) {
    const existing = index.search(key);
    if (existing.length > 0 && existing[0].score > 0.8) {
        // High similarity detected - trigger conflict resolution logic
        return { 
            status: "CONFLICT", 
            existing: existing[0].content,
            suggestion: "Merging this with the existing architectural decision is recommended."
        };
    }
    await db.save(key, content);
    index.add({ id: key, key, content });
}

Performance Deep Dive: Vector Indexing at the Edge

A persistent memory system that takes 2 seconds to "search" is a distraction. For local-first memory, we avoid cloud-based vector databases in favor of MiniSearch or a local SQLite-VSS extension. This allows for sub-50ms retrieval of project context without sending your entire thought-history to a third-party server.

Context Window Management: Claude Code has a massive 200k window, but filling it with 150k of "memory" slows down reasoning and increases costs. We implement Selective Context Injection: only the top 5 most relevant memories are injected into the active prompt, while the rest remain accessible via explicit tool calls.

Architecture: The Cross-Session Skill Store

The true power of this system is its role as a "Cross-Session Router." By storing memories in a hidden .stacklyn_memory folder in the user's home directory, the agent can share "Global Skills" across different projects.

1. Stdio Transport

Claude Code communicates with the local MCP server via standard input/output (Stdio), ensuring zero-latency communication.

2. Semantic Search Tool

When you ask "how did we handle the last refactor?", the agent calls the search tool to retrieve the indexed decision log.

3. Skill Extraction

Frequent command sequences (e.g., specialized Docker builds) are extracted and saved as "Skills" for instant reuse.

4. Cleanup Daemon

A background process prunes obsolete session data every 30 days to keep the local index lean and fast.

Production Strategy: Testing and Deployment

Testing an MCP server requires mocking the Stdio Lifecycle. We use a custom test harness that pipes JSON-RPC messages into the server and validates that the semantic index returns the expected matches. This "Mocks-First" approach ensures that the memory logic is reliable before it ever touches your terminal.

For deployment, we package the MCP memory server as a Global NPM package. This allows developers to simple run npm install -g @stacklyn/mcp-memory and add a single line to their claude_code_config.json to enable persistence.

// example config: .claude_code_config.json
{
  "mcp_servers": {
    "memory": {
      "command": "stacklyn-mcp-memory",
      "args": ["--storage-dir", "~/.stacklyn"]
    }
  }
}

Conclusion

The era of ephemeral AI is ending. By embracing persistent memory and MCP, we are building tools that accumulate value over time. Claude Code is already a great assistant; with a memory, it becomes an extension of your own technical experience, retaining the "Why" behind every line of code you write.

Boosting LLM Productivity: Open-Sourcing a Persistent Memory System for Claude Code

The "Stranger Every Morning" Problem

Defensive Memory: Handling Semantic Drift and Duplication

Performance Deep Dive: Vector Indexing at the Edge