A command-line tool for evaluating and testing Model Context Protocol (MCP) servers. MCP-Evals helps you validate server capabilities, test tool calls, and ensure LLMs are correctly utilizing the provided tooling.
To install this package in your project (needed for tool evaluations), run
npm install @buildwithlayer/mcp-evals
.
To install the CLI globally, run npm install -g @buildwithlayer/mcp-evals
.
MCP-Evals can be configured using command-line flags, a configuration file, or a combination of both (command-line flags will override configuration file properties).
Create a .mcp-evals.json
file in your project root:
{
"transport": "sse",
"url": "http://localhost:3001/sse",
"tool-evals-directory": "./path/to/evals"
}
Option | Description | Example |
---|---|---|
transport |
Connection transport type (sse , stdio , or streamableHTTP ) |
"sse" |
url |
URL for SSE / Streamable HTTP connection | "http://localhost:3001/sse" |
command |
Command for STDIO connection | "node ./server.js" |
args |
Arguments for STDIO command | ["--port", "3000"] |
env |
Environment variables for STDIO command | {"NODE_ENV": "test"} |
tool-evals-directory |
Directory containing tool evaluation files | "./src/evals" |
Create a .env
file for sensitive configuration:
OPENAI_API_KEY=your_api_key_here
# Test connection to an MCP server
mcp-evals connection --sse --url http://localhost:3001/sse
# Using STDIO transport
mcp-evals connection --stdio --command node --args ./server.js
# List available tools
mcp-evals list-tools [--verbose]
# List available resources
mcp-evals list-resources [--verbose]
# List available prompts
mcp-evals list-prompts [--verbose]
# Test connection, list tools, resources, and prompts (whichever are included in server capabilities)
mcp-evals all [--verbose]
Flag | Description |
---|---|
--config |
Path to JSON configuration file (defaults to .mcp-evals.json in root directory) |
--sse |
Connect using SSE transport |
--stdio |
Connect using STDIO transport |
--streamableHttp |
Connect using Streamable HTTP transport |
--url |
URL to connect to (for SSE or Streamable HTTP) |
--command |
Command to run (for STDIO) |
--args |
Arguments for the command (space-separated) |
--env |
Environment variables (format: KEY=VALUE) |
--verbose |
Enable verbose output |
MCP-Evals includes a testing framework for validating LLM tool calls.
The testing SDK provides a fluent API for assertions:
Method | Description |
---|---|
expect(MCPAIClient) |
Create an expectation for testing |
.toCall(tool/toolName) |
Assert that a specific tool was called |
.withArguments(args, {exact}) |
Assert tool was called with specific arguments (Deep partial match if exact not set to true ) |
.withArguments(matcherFn) |
Assert tool was called with arguments satisfying the matcher function: (args) ⇒ boolean
|
.withResult(result, {exact}) |
Assert tool was called with specific result (Deep partial match if exact not set to true ) |
.withResult(matcherFn) |
Assert tool was called with result satisfying the matcher function: (result) ⇒ boolean
|
Create a TypeScript or JavaScript (ending in .tool-eval.ts
or .tool-eval.js
) file with exported functions:
// filepath: src/examples/tool-evals/search.tool-eval.ts
import {mcpAi, expect} from '@buildwithlayer/mcp-evals'
import {openai} from '@ai-sdk/openai'
// Test that "search" tool is called with correct query & the result contains a text content object including the word "climate"
async function searchToolEval() {
// Model will default to gpt-4o if not provided
await mcpAi.send('Find information about climate change', openai('gpt-3.5-turbo'))
return expect(mcpAi)
.toCall('search')
.withArguments({query: 'climate change'}, {exact: true})
.withResult((result) => result.content.some((c) => c.type === 'text' && c.text.includes('climate')))
}
// Export all evaluation functions
export default {
searchToolEval,
}
# Run evaluations from a specific file
mcp-evals run-tool-evals --evalsFile ./src/examples/tool-evals/add-tool-eval.ts
# Run all evaluations files within a specific directory
mcp-evals run-tool-evals --evalsDir ./src/examples/tool-evals
# Run all evaluations from the directory listed in your configuration file
mcp-evals run-tool-evals
# Exit with an error status code on first failed eval
mcp-evals run-tool-evals --bail