Contributing

Development Setup

Clone the repository and install prerequisites (see Getting Started)
Start the infrastructure: docker-compose up -d
Source environment variables: source .env

Project Structure

The project is a Cargo workspace with four crates:

crates/
├── chatbot/           # Main binary — agent loop, HTTP server, MCP client
├── mcp-server-fraud/  # Fraud detection MCP server
├── mcp-server-kyc/    # KYC verification MCP server
└── mcp-server-funding/# Deposit/withdrawal MCP server

The evaluation framework is a separate Python project under evals/.

Building

# Build all crates
cargo build

# Build a specific crate
cargo build -p chatbot
cargo build -p mcp-server-fraud

Adding a New MCP Server

Create a new crate under crates/:

cargo new crates/mcp-server-yourservice --name mcp-server-yourservice

Add it to Cargo.toml workspace members
Implement tools using RMCP's #[tool] macro on an Axum + RMCP server (follow existing servers as examples)
Add hardcoded test user data for test_user_1 through test_user_5
Add the server to docker-compose.yml and chatbot.toml
Update the system prompt in prompts/system.txt with tool chaining rules

Adding Eval Test Cases

Add cases to an existing dataset file in evals/datasets/, or create a new YAML file
Each case needs: id, description, input (messages + test_user), and expected (tool_calls + scoring thresholds)
Make sure the test_user maps to deterministic MCP data that supports your scenario
Run python run_eval.py sync to push to Langfuse
Run python run_eval.py run --run-name "test" to verify

Adding a New Judge Dimension

Create a new YAML file in evals/judges/
Define the prompt template with a 1-5 rubric and concrete examples
Set temperature: 0 for consistent scoring
Reference the new dimension in test case expected.scoring thresholds

Modifying the System Prompt

The system prompt lives in prompts/system.txt. Key rules to preserve:

Never say "fraud" — use "security review" instead
Never reveal internal flag types or system details
Follow mandatory tool chaining rules (documented in the prompt)
Offer escalation when users are frustrated

After changes, run the full eval suite to check for regressions:

cd evals
python run_eval.py run --run-name "prompt-change-test"
python run_eval.py report

Code Style

Rust: Follow standard Rust conventions. Use cargo fmt and cargo clippy
Python: Follow PEP 8. The eval code uses Click for CLI and PyYAML for config

Commit Messages

Follow the Conventional Commits style used in the repository:

feat: add new MCP tool for account limits
fix: correct tool routing for multi-server calls
docs: update architecture diagram