Skip to content

Contributing

Development Setup

  1. Clone the repository and install prerequisites (see Getting Started)
  2. Start the infrastructure: docker-compose up -d
  3. Source environment variables: source .env

Project Structure

The project is a Cargo workspace with four crates:

crates/
├── chatbot/           # Main binary — agent loop, HTTP server, MCP client
├── mcp-server-fraud/  # Fraud detection MCP server
├── mcp-server-kyc/    # KYC verification MCP server
└── mcp-server-funding/# Deposit/withdrawal MCP server

The evaluation framework is a separate Python project under evals/.

Building

# Build all crates
cargo build

# Build a specific crate
cargo build -p chatbot
cargo build -p mcp-server-fraud

Adding a New MCP Server

  1. Create a new crate under crates/:

    cargo new crates/mcp-server-yourservice --name mcp-server-yourservice
    
  2. Add it to Cargo.toml workspace members

  3. Implement tools using RMCP's #[tool] macro on an Axum + RMCP server (follow existing servers as examples)
  4. Add hardcoded test user data for test_user_1 through test_user_5
  5. Add the server to docker-compose.yml and chatbot.toml
  6. Update the system prompt in prompts/system.txt with tool chaining rules

Adding Eval Test Cases

  1. Add cases to an existing dataset file in evals/datasets/, or create a new YAML file
  2. Each case needs: id, description, input (messages + test_user), and expected (tool_calls + scoring thresholds)
  3. Make sure the test_user maps to deterministic MCP data that supports your scenario
  4. Run python run_eval.py sync to push to Langfuse
  5. Run python run_eval.py run --run-name "test" to verify

Adding a New Judge Dimension

  1. Create a new YAML file in evals/judges/
  2. Define the prompt template with a 1-5 rubric and concrete examples
  3. Set temperature: 0 for consistent scoring
  4. Reference the new dimension in test case expected.scoring thresholds

Modifying the System Prompt

The system prompt lives in prompts/system.txt. Key rules to preserve:

  • Never say "fraud" — use "security review" instead
  • Never reveal internal flag types or system details
  • Follow mandatory tool chaining rules (documented in the prompt)
  • Offer escalation when users are frustrated

After changes, run the full eval suite to check for regressions:

cd evals
python run_eval.py run --run-name "prompt-change-test"
python run_eval.py report

Code Style

  • Rust: Follow standard Rust conventions. Use cargo fmt and cargo clippy
  • Python: Follow PEP 8. The eval code uses Click for CLI and PyYAML for config

Commit Messages

Follow the Conventional Commits style used in the repository:

feat: add new MCP tool for account limits
fix: correct tool routing for multi-server calls
docs: update architecture diagram