Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixus.ai/llms.txt

Use this file to discover all available pages before exploring further.

Quick examples

Copy any of these examples and submit immediately. Just replace the reviewer email with your own.

1. Simple calculation

To: agent@mixus.com
Subject: Eval: Sales Commission

Calculate 15% commission on a $50,000 sale.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: $7,500
Duration: ~2 minutes
Checkpoints: 1 (verify calculation)

2. Market research

To: agent@mixus.com
Subject: Eval: Competitor Pricing

Research pricing for Anthropic Claude, OpenAI ChatGPT, and
Google Gemini. Create a comparison table showing monthly costs
for each tier.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Comparison table with pricing tiers
Duration: ~3-5 minutes
Checkpoints: 1 (verify research accuracy)

3. Email drafting

To: agent@mixus.com
Subject: Eval: Customer Follow-up

Draft a follow-up email to customer who requested product demo.
Include:
- Thank them for interest
- Propose 3 time slots next week
- Link to product features page
- Warm, professional tone

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Professional email draft
Duration: ~2-3 minutes
Checkpoints: 1 (review email before sending)

4. Financial analysis

To: agent@mixus.com
Subject: Eval: ROI Calculation

Calculate ROI for marketing campaign with:
- Total spend: $50,000
- Revenue generated: $175,000
- Campaign duration: 3 months

Show ROI % and monthly breakdown.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: 250% ROI with monthly breakdown
Duration: ~2 minutes
Checkpoints: 1 (verify calculations)

5. Multi-step workflow

To: agent@mixus.com
Subject: Eval: Quarterly Report

1. Search for our Q3 sales data
2. Calculate total revenue and growth rate
3. Identify top 3 products by sales
4. Create executive summary
5. Email summary to team@company.com

Checkpoints:
1. Verify data accuracy before analysis
2. Review summary before sending

Test type: with-verification
Reviewer: your-email@company.com
Expected result: Email sent with quarterly summary
Duration: ~8-12 minutes
Checkpoints: 2 (data verification, email approval)

6. Data analysis

To: agent@mixus.com
Subject: Eval: Customer Satisfaction Report

Create customer satisfaction report including:
- Survey response rate
- Average satisfaction score
- Top 3 positive themes
- Top 3 improvement areas
- Recommendations

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Comprehensive satisfaction report
Duration: ~5-8 minutes
Checkpoints: 1 (review analysis)

7. Baseline speed test

To: agent@mixus.com
Subject: Eval: Quick Math Baseline

Calculate:
1. 15% of $50,000
2. 23% of $75,000
3. 8.5% of $120,000

Test type: without-verification
Reviewer: your-email@company.com
Expected result: Three calculations
Duration: ~1 minute
Checkpoints: 0 (baseline test)

8. Research with external tools

To: agent@mixus.com
Subject: Eval: Crypto Portfolio Value

Research current Bitcoin and Ethereum prices from CoinMarketCap.
Then calculate portfolio value for:
- 0.5 BTC
- 10 ETH

Show individual values and total.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Current portfolio value
Duration: ~3 minutes
Checkpoints: 1 (verify prices and calculations)

9. Complex calculation with context

To: agent@mixus.com
Subject: Eval: R&D Tax Credit

Calculate R&D tax credit using Alternative Simplified Credit method.

Data:
- Current year QRE: $120,000
- Prior 3-year average: $90,000
- Formula: 14% × (current_qre - base_amount)
- Base amount = 50% of prior average

Checkpoints:
1. Verify base amount is $45,000
2. Verify final credit is $10,500

Test type: with-verification
Reviewer: your-email@company.com
Expected result: $10,500 tax credit
Duration: ~3-4 minutes
Checkpoints: 2 (base amount, final credit)

10. Integration task

To: agent@mixus.com
Subject: Eval: Calendar Scheduling

Find 3 available 30-minute slots next week when both
john@company.com and sarah@company.com are free. Check their
Google calendars.

Available hours: 9am-5pm EST
Prefer afternoon slots

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: 3 available time slots
Duration: ~4-6 minutes
Checkpoints: 1 (approve proposed times)

Batch testing example

Submit multiple tasks at once:
#!/bin/bash

API_KEY="mxs_eval_YOUR_KEY"
REVIEWER="your-email@company.com"

# Array of tasks
tasks=(
  '{"taskName":"Test 1","taskDescription":"Calculate 15% of $50,000"}'
  '{"taskName":"Test 2","taskDescription":"Calculate 23% of $75,000"}'
  '{"taskName":"Test 3","taskDescription":"Calculate 8.5% of $120,000"}'
)

# Submit all tasks
for task in "${tasks[@]}"; do
  echo "Submitting: $task"

  curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      $(echo $task | jq -r '. + {autoDetectCheckpoints:true, testMode:"with-verification", assignedReviewer:"'$REVIEWER'"}')
    }"

  echo ""
done

Comparison testing

Run same task with and without human verification:

With verification

curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "taskName": "Commission Calc - With Human",
    "taskDescription": "Calculate 15% commission on $50,000 sale",
    "autoDetectCheckpoints": true,
    "testMode": "with-verification",
    "assignedReviewer": "your-email@company.com",
    "externalId": "test-with-human"
  }'

Without verification (baseline)

curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "taskName": "Commission Calc - Baseline",
    "taskDescription": "Calculate 15% commission on $50,000 sale",
    "testMode": "without-verification",
    "assignedReviewer": "your-email@company.com",
    "externalId": "test-baseline"
  }'
Compare results in dashboard at app.mixus.ai/eval

Next steps

Task Preparation

Learn how to write your own tasks

Best Practices

Tips for better results

API Reference

Complete API documentation

Get Started

Submit your first task now