Using Ragex as a Local MCP Server

View Source

Ragex is a self-hosted MCP (Model Context Protocol) server that adds Hybrid RAG capabilities to any MCP-compatible AI client or editor. It runs entirely on your machine — no external services, no data leaving your system.

Table of Contents


What You Get

Once connected, any attached AI agent gains access to roughly 50 MCP tools covering:

  • Code indexing — analyze files and directories into a knowledge graph
  • Semantic search — natural-language queries resolved by local ML embeddings
  • Hybrid search — symbolic graph + semantic retrieval fused with Reciprocal Rank Fusion
  • RAG pipelinerag_query, rag_explain, rag_suggest backed by your configured AI provider
  • Safe editing — atomic multi-file edits with validation, backup, and rollback
  • Semantic refactoring — rename functions and modules project-wide with AST awareness
  • Code analysis — dead code, duplication, coupling, security, smells, quality metrics
  • Graph algorithms — PageRank, betweenness centrality, community detection

Languages supported for analysis: Elixir, Erlang, Python, Ruby, JavaScript/TypeScript.


Prerequisites

RequirementNotes
Elixir 1.18+Check with elixir --version
Erlang/OTP 27+Bundled with Elixir installations from asdf/mise
~500 MB RAMFor the default embedding model at runtime
~200 MB diskBuild artefacts + the first-run model download (~90 MB)
Python 3.xOptional; required only for Python file analysis
Node.jsOptional; required only for JavaScript/TypeScript file analysis

Installation

git clone https://github.com/Oeditus/ragex.git
cd ragex
mix deps.get
mix compile

First compilation takes a few minutes because of the ML dependencies (Nx, EXLA, Bumblebee). The embedding model itself (~90 MB) is downloaded from HuggingFace on the first server start and cached in ~/.cache/huggingface/.

To pre-download it before the first real use:

mix ragex.models.download

Transport Modes

Ragex speaks MCP over two transports simultaneously:

TransportAddressBest for
stdiostdin / stdoutEditor integrations (Zed, Cursor, Claude Desktop, Warp)
Unix socket/tmp/ragex_mcp.sockLocal tooling, LunarVim plugin, socat scripts

Both are active whenever the server is running. The stdio transport is the one MCP specifications require; the socket transport is an extension for clients that cannot manage a long-lived subprocess.

When a second process tries to start Ragex while a socket server is already alive, bin/ragex-mcp detects this automatically and launches a lightweight bridge (bin/ragex-bridge) instead of spinning up a second BEAM VM with another GPU/ML model allocation.


Starting the Server

./bin/ragex-mcp

This script:

  1. Sets MIX_ENV=prod for optimized performance.
  2. Sets RAGEX_STDIO=1 so the server accepts MCP commands on stdin/stdout.
  3. Compiles silently (output to stderr so JSON-RPC on stdout stays clean).
  4. Detects a running instance via the Unix socket — bridges to it instead of double-starting.
  5. Runs mix run --no-halt to keep the process alive.

Optional flags:

# Auto-analyze a project directory on startup
bin/ragex-mcp --project /path/to/your/project

# Override log verbosity
bin/ragex-mcp --log-level debug

Equivalent environment variables:

RAGEX_PROJECT=/path/to/your/project  bin/ragex-mcp
RAGEX_LOG_LEVEL=debug                bin/ragex-mcp
RAGEX_EMBEDDING_MODEL=codebert_base  bin/ragex-mcp

Minimal start (development)

mix run --no-halt

Background start with logging

./start_mcp.sh           # writes logs to ragex.log in the project root
./start_server.sh        # writes logs to /tmp/ragex_server.log

Interactive / debug shell

RAGEX_NO_SERVER=1 iex -S mix

This starts an IEx session with the full application loaded but without the MCP server, useful for ad-hoc testing.


Connecting MCP Clients

All MCP clients that communicate over stdio need the path to bin/ragex-mcp and the working directory of the Ragex project. Use the absolute path.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent path on Linux (~/.config/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "ragex": {
      "command": "/absolute/path/to/ragex/bin/ragex-mcp",
      "args": [],
      "env": {}
    }
  }
}

To automatically index a project when Claude starts:

{
  "mcpServers": {
    "ragex": {
      "command": "/absolute/path/to/ragex/bin/ragex-mcp",
      "args": ["--project", "/path/to/your/elixir/project"],
      "env": {}
    }
  }
}

Restart Claude Desktop after saving. Ragex tools will appear in the tool list.

Cursor

Create or edit .cursor/mcp.json in your home directory or project root:

{
  "mcpServers": {
    "ragex": {
      "command": "/absolute/path/to/ragex/bin/ragex-mcp",
      "args": ["--project", "${workspaceFolder}"],
      "env": {
        "RAGEX_LOG_LEVEL": "warning"
      }
    }
  }
}

Zed

Add to ~/.config/zed/settings.json for system-wide availability:

{
  "context_servers": {
    "ragex": {
      "command": {
        "path": "/absolute/path/to/ragex/bin/ragex-mcp",
        "args": [],
        "env": {}
      }
    }
  }
}

To auto-analyze a specific project when using Ragex from within any other workspace:

{
  "context_servers": {
    "ragex": {
      "command": {
        "path": "/absolute/path/to/ragex/bin/ragex-mcp",
        "args": ["--project", "/path/to/your/project"],
        "env": {}
      }
    }
  }
}

For per-project configuration place .zed/settings.json in the project root. See ZED.md for the full Zed integration guide including task runner and keybindings.

LunarVim / NeoVim

LunarVim communicates with Ragex through the Unix socket (/tmp/ragex_mcp.sock). Start the server first, then use the Lua plugin:

Step 1 — start the server (in a terminal, keep it running):

cd /path/to/ragex
./start_mcp.sh

Verify it is alive:

./test_socket.sh

Step 2 — install the plugin files

Copy lvim.cfg/lua/user/ into your LunarVim config directory (typically ~/.config/lvim/lua/user/) and add the snippet from the main README to your config.lua. The plugin communicates with the socket using socat.

Step 3 — verify

:lua print(require('ragex').config.socket_path)   -- should print /tmp/ragex_mcp.sock
:Ragex search

See SERVER_GUIDE.md in the project root for detailed socket-mode troubleshooting.

Generic stdio client

Any program can speak to Ragex over stdio. Send newline-delimited JSON-RPC 2.0 messages:

# Initialize
echo '{"jsonrpc":"2.0","method":"initialize","params":{"clientInfo":{"name":"my-client","version":"1.0"}},"id":1}' \
  | bin/ragex-mcp

# List tools
echo '{"jsonrpc":"2.0","method":"tools/list","id":2}' | bin/ragex-mcp

From Python:

import json, subprocess

proc = subprocess.Popen(
    ["/path/to/ragex/bin/ragex-mcp"],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
)

def call(method, params, id=1):
    req = json.dumps({"jsonrpc": "2.0", "method": "tools/call",
                      "params": {"name": method, "arguments": params}, "id": id})
    proc.stdin.write(req.encode() + b"\n")
    proc.stdin.flush()
    return json.loads(proc.stdout.readline())

call("analyze_directory", {"path": "/my/project", "recursive": True})

Indexing Your Codebase

Ragex needs to analyze your codebase before it can answer questions about it. Once the server is running, ask the connected AI to call these tools, or invoke them directly.

Analyze a directory (MCP tool call)

{
  "name": "analyze_directory",
  "arguments": {
    "path": "/path/to/your/project",
    "recursive": true,
    "generate_embeddings": true
  }
}

This populates the in-memory ETS knowledge graph and generates 384-dimensional embeddings for every module and function. Typical throughput is ~100 files per second; a 1,000-file project takes under 30 seconds.

Auto-analyze on startup

Add directories to index automatically every time Ragex starts:

# config/config.exs
config :ragex, :auto_analyze_dirs, [
  "/path/to/project-a",
  "/path/to/project-b"
]

Or pass a single path via environment variable / CLI flag:

RAGEX_AUTO_ANALYZE=/path/to/project bin/ragex-mcp
bin/ragex-mcp --project /path/to/project

Watch for changes

Enable automatic re-indexing whenever files change:

{
  "name": "watch_directory",
  "arguments": {
    "path": "/path/to/your/project"
  }
}

Only modified files are re-analyzed (SHA256-based change detection), so incremental updates are fast.


RAG Queries

RAG tools combine local semantic retrieval with an external AI provider to answer questions grounded in your actual code.

Ask a question

{
  "name": "rag_query",
  "arguments": {
    "query": "How does authentication work in this codebase?",
    "limit": 15,
    "include_code": true
  }
}

Ragex retrieves the most relevant functions and modules via hybrid search, formats them as context (up to ~8,000 characters), and sends them together with your question to the configured AI provider.

Explain a function or file

{
  "name": "rag_explain",
  "arguments": {
    "target": "MyApp.Auth.authenticate_user/2",
    "aspect": "complexity"
  }
}

aspect can be purpose, complexity, dependencies, or all.

Suggest improvements

{
  "name": "rag_suggest",
  "arguments": {
    "target": "lib/my_app/auth.ex",
    "focus": "security"
  }
}

focus can be performance, readability, testing, security, or all.

Streaming variants

All three tools have streaming counterparts (rag_query_stream, rag_explain_stream, rag_suggest_stream) that emit partial responses as they arrive from the AI provider.

Interactive chat (CLI)

mix ragex.chat --provider deepseek_r1

Opens a REPL that runs a ReAct agent loop: the AI calls Ragex tools directly to gather evidence before answering.


Embedding Models

Embeddings power semantic and hybrid search. Four models are pre-configured:

Model IDDimensionsSizeBest for
all_minilm_l6_v2384~90 MBDefault; fast; good general quality
all_mpnet_base_v2768~420 MBHighest quality; large codebases
codebert_base768~500 MBCode-specific queries; API discovery
paraphrase_multilingual384~110 MBNon-English comments and documentation

Configure in config/config.exs:

config :ragex, :embedding_model, :all_minilm_l6_v2

Or via environment variable (overrides config):

export RAGEX_EMBEDDING_MODEL=codebert_base

Models with the same number of dimensions are cache-compatible — you can switch between all_minilm_l6_v2 and paraphrase_multilingual without regenerating embeddings. Switching between 384-dim and 768-dim models requires a re-index.

Check current model and cache status:

mix ragex.embeddings.migrate --check

Manage the embedding cache:

mix ragex.cache.stats          # Show cache statistics
mix ragex.cache.refresh        # Incremental refresh (changed files only)
mix ragex.cache.clear --all    # Clear all cached embeddings

AI Providers for RAG

RAG tools (rag_query, rag_explain, rag_suggest) require an external AI provider. Configure via environment variables:

# DeepSeek (default provider)
export DEEPSEEK_API_KEY="sk-..."

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Ollama (local, no key needed)
export OLLAMA_HOST="http://localhost:11434"

Set the default provider in config/config.exs:

config :ragex, :ai,
  providers: [:openai, :anthropic, :deepseek_r1, :ollama],
  default_provider: :deepseek_r1,
  fallback_enabled: true

Override the provider per-query:

{
  "name": "rag_query",
  "arguments": {
    "query": "What does the supervisor tree look like?",
    "provider": "ollama"
  }
}

AI responses are cached (ETS, TTL 1 hour by default) to avoid redundant API calls. Monitor usage:

{"name": "get_ai_usage", "arguments": {}}
{"name": "get_ai_cache_stats", "arguments": {}}

Semantic search and hybrid search work entirely offline using local Bumblebee embeddings — no AI provider key is needed for these.


Configuration Reference

The main configuration file is config/config.exs. Below are the most relevant sections for MCP server usage.

Embedding model

config :ragex, :embedding_model, :all_minilm_l6_v2

Embedding cache

config :ragex, :cache,
  enabled: true,
  dir: Path.expand("~/.cache/ragex"),
  max_age_days: 30

Auto-analyze on startup

config :ragex, :auto_analyze_dirs, [
  "/path/to/project-a",
  "/path/to/project-b"
]

AI providers

config :ragex, :ai,
  providers: [:openai, :anthropic, :deepseek_r1, :ollama],
  default_provider: :deepseek_r1,
  fallback_enabled: true

AI features (optional)

Enable AI-enhanced analysis features (require an AI provider):

config :ragex, :ai_features,
  validation_error_explanation: true,   # AI explanations for syntax errors
  refactor_preview_commentary: true,    # Risk analysis in refactor previews
  dead_code_refinement: true,           # Reduce false positives in dead code reports
  duplication_semantic_analysis: true,  # Semantic Type IV clone detection
  dependency_insights: true             # Architectural insights for coupling analysis

Search thresholds

config :ragex, :search,
  default_threshold: 0.2,   # similarity cutoff for semantic_search
  hybrid_threshold: 0.15    # similarity cutoff for hybrid_search (lower = more recall)

Editor / backup settings

config :ragex, :editor,
  backup_dir: Path.expand("~/.ragex/backups"),
  backup_retention: 10,
  validate_by_default: true,
  create_backup_by_default: true

Graph algorithm limits

config :ragex, :graph,
  max_nodes_betweenness: 10_000,
  max_nodes_export: 10_000

Keeping the Index Fresh

Ragex stores the knowledge graph in ETS (in-memory). The state is lost when the server stops. On restart:

  1. Embedding cache is loaded from disk (~/.cache/ragex/) — this makes semantic search available within a few seconds.
  2. Graph nodes/edges are rebuilt by re-analyzing directories listed in auto_analyze_dirs.
  3. File watcher resumes watching once watch_directory is called again (or configured via auto-analyze).

For a project you work on daily, a sensible setup is:

# config/config.exs
config :ragex, :auto_analyze_dirs, ["/path/to/my/project"]

Combined with watching:

{"name": "watch_directory", "arguments": {"path": "/path/to/my/project"}}

This gives you a fully up-to-date graph within seconds of each server start, with no manual re-indexing.


Performance Tips

First startup is slow — the ML model loads and JIT-compiles via EXLA. Expect 30–90 seconds. Every subsequent start is fast because the model binary is cached by Bumblebee.

First analysis is slow — embedding generation takes ~50 ms per entity. For a 500-function project that is ~25 seconds. The embedding cache makes this a one-time cost.

Memory — the default all_minilm_l6_v2 model requires ~400 MB RAM. Larger models (all_mpnet_base_v2, codebert_base) need ~800–900 MB. Plan accordingly if running Ragex alongside other memory-intensive processes.

Search quality vs. speed — the default similarity threshold of 0.2 favors recall. For precise lookup, raise it to 0.7+. For exploratory questions, keep it at the default or lower.

Large codebases (>10,000 entities) — use incremental cache refresh (mix ragex.cache.refresh) instead of full re-analysis on each server restart.


Troubleshooting

Server won't start

mix compile                    # check for compilation errors
mix deps.get && mix compile    # fetch missing dependencies

Embedding model download fails

The model is fetched from HuggingFace on first run. If you are behind a proxy or firewall:

# Set proxy
export HTTPS_PROXY=http://proxy:port

# Or pre-download manually
mix ragex.models.download

Model cache location: ~/.cache/huggingface/

MCP client shows no tools / red indicator

# Confirm the binary is executable
chmod +x bin/ragex-mcp bin/ragex-bridge

# Test stdio mode manually
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | bin/ragex-mcp
# Should print a JSON response with a "result" field containing tool definitions

Check editor-specific logs:

  • Zed: Ctrl+Shift+P > "zed: open logs", search for "ragex"
  • Cursor: Help > Toggle Developer Tools > Console
  • Claude Desktop: open ~/Library/Logs/Claude/ (macOS)

Socket server: "connection refused" or hanging

# Kill stale process and clean up socket
pkill -f "mix run"
rm -f /tmp/ragex_mcp.sock

# Restart
./start_mcp.sh

# Verify
./test_socket.sh

RAG queries return no AI response

Ensure the provider API key is set in the environment where Ragex is launched:

DEEPSEEK_API_KEY=sk-...  bin/ragex-mcp

Check usage and limits:

{"name": "get_ai_usage", "arguments": {}}

Search returns poor results

  • Lower the threshold: "threshold": 0.1
  • Switch retrieval strategy: "strategy": "semantic_first" or "graph_first"
  • Try a different query phrasing
  • Verify the codebase is indexed: {"name": "graph_stats", "arguments": {}}
  • Check embeddings exist: {"name": "get_embeddings_stats", "arguments": {}}

High memory / OOM

Switch to the smaller model:

# config/config.exs
config :ragex, :embedding_model, :all_minilm_l6_v2

Or set via environment before starting:

RAGEX_EMBEDDING_MODEL=all_minilm_l6_v2 bin/ragex-mcp

Logs

Ragex logs to ragex.log (rotating, max 10 MB, 5 files) in the project root by default. Tail it for real-time diagnostics:

tail -f ragex.log

To increase verbosity:

LOG_LEVEL=debug bin/ragex-mcp

See Also

  • CONFIGURATION.md — full configuration reference including model migration
  • TOOLS.md — complete MCP tools reference with parameters
  • USAGE.md — editor-specific integration guides (VIM, LunarVim)
  • ZED.md — first-class Zed integration (tasks, keybindings, agent profile)
  • PERSISTENCE.md — embedding cache internals and management
  • TROUBLESHOOTING.md — error messages and analysis issues
  • SERVER_GUIDE.md — Unix socket server management