@buildwithlayer/mcp-evals
TypeScript icon, indicating that this package has built-in type declarations

0.0.6 • Public • Published

MCP-Evals

A command-line tool for evaluating and testing Model Context Protocol (MCP) servers. MCP-Evals helps you validate server capabilities, test tool calls, and ensure LLMs are correctly utilizing the provided tooling.

Installation

To install this package in your project (needed for tool evaluations), run npm install @buildwithlayer/mcp-evals.

To install the CLI globally, run npm install -g @buildwithlayer/mcp-evals.

Configuration

MCP-Evals can be configured using command-line flags, a configuration file, or a combination of both (command-line flags will override configuration file properties).

Configuration File

Create a .mcp-evals.json file in your project root:

{
  "transport": "sse",
  "url": "http://localhost:3001/sse",
  "tool-evals-directory": "./path/to/evals"
}

Configuration Options

Option Description Example
transport Connection transport type (sse, stdio, or streamableHTTP) "sse"
url URL for SSE / Streamable HTTP connection "http://localhost:3001/sse"
command Command for STDIO connection "node ./server.js"
args Arguments for STDIO command ["--port", "3000"]
env Environment variables for STDIO command {"NODE_ENV": "test"}
tool-evals-directory Directory containing tool evaluation files "./src/evals"

Environment Variables

Create a .env file for sensitive configuration:

OPENAI_API_KEY=your_api_key_here

Commands

Testing Server Connection

# Test connection to an MCP server
mcp-evals connection --sse --url http://localhost:3001/sse

# Using STDIO transport
mcp-evals connection --stdio --command node --args ./server.js

Listing Server Capabilities

# List available tools
mcp-evals list-tools [--verbose]

# List available resources
mcp-evals list-resources [--verbose]

# List available prompts
mcp-evals list-prompts [--verbose]

# Test connection, list tools, resources, and prompts (whichever are included in server capabilities)
mcp-evals all [--verbose]

Common Flags for All Commands

Flag Description
--config Path to JSON configuration file (defaults to .mcp-evals.json in root directory)
--sse Connect using SSE transport
--stdio Connect using STDIO transport
--streamableHttp Connect using Streamable HTTP transport
--url URL to connect to (for SSE or Streamable HTTP)
--command Command to run (for STDIO)
--args Arguments for the command (space-separated)
--env Environment variables (format: KEY=VALUE)
--verbose Enable verbose output

Tool Evaluation Framework

MCP-Evals includes a testing framework for validating LLM tool calls.

Assertion API

The testing SDK provides a fluent API for assertions:

Method Description
expect(MCPAIClient) Create an expectation for testing
.toCall(tool/toolName) Assert that a specific tool was called
.withArguments(args, {exact}) Assert tool was called with specific arguments (Deep partial match if exact not set to true)
.withArguments(matcherFn) Assert tool was called with arguments satisfying the matcher function: (args) ⇒ boolean
.withResult(result, {exact}) Assert tool was called with specific result (Deep partial match if exact not set to true)
.withResult(matcherFn) Assert tool was called with result satisfying the matcher function: (result) ⇒ boolean

Example Usage

Create a TypeScript or JavaScript (ending in .tool-eval.ts or .tool-eval.js) file with exported functions:

// filepath: src/examples/tool-evals/search.tool-eval.ts
import {mcpAi, expect} from '@buildwithlayer/mcp-evals'
import {openai} from '@ai-sdk/openai'

// Test that "search" tool is called with correct query & the result contains a text content object including the word "climate"
async function searchToolEval() {
  // Model will default to gpt-4o if not provided
  await mcpAi.send('Find information about climate change', openai('gpt-3.5-turbo'))
  return expect(mcpAi)
    .toCall('search')
    .withArguments({query: 'climate change'}, {exact: true})
    .withResult((result) => result.content.some((c) => c.type === 'text' && c.text.includes('climate')))
}

// Export all evaluation functions
export default {
  searchToolEval,
}

Running Tool Evaluations

# Run evaluations from a specific file
mcp-evals run-tool-evals --evalsFile ./src/examples/tool-evals/add-tool-eval.ts

# Run all evaluations files within a specific directory
mcp-evals run-tool-evals --evalsDir ./src/examples/tool-evals

# Run all evaluations from the directory listed in your configuration file
mcp-evals run-tool-evals

# Exit with an error status code on first failed eval
mcp-evals run-tool-evals --bail

Readme

Keywords

Package Sidebar

Install

npm i @buildwithlayer/mcp-evals

Weekly Downloads

77

Version

0.0.6

License

none

Unpacked Size

74.4 kB

Total Files

48

Last publish

Collaborators

  • codewithlayer
  • ayaanglayer
  • gavyn_with_layer