Skip to main content

Quick examples

Copy any of these examples and submit immediately. Just replace the reviewer email with your own.

1. Simple calculation

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Sales Commission

Calculate 15% commission on a $50,000 sale.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: $7,500
Duration: ~2 minutes
Checkpoints: 1 (verify calculation)

2. Market research

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Competitor Pricing

Research pricing for Anthropic Claude, OpenAI ChatGPT, and 
Google Gemini. Create a comparison table showing monthly costs 
for each tier.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Comparison table with pricing tiers
Duration: ~3-5 minutes
Checkpoints: 1 (verify research accuracy)

3. Email drafting

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Customer Follow-up

Draft a follow-up email to customer who requested product demo. 
Include:
- Thank them for interest
- Propose 3 time slots next week
- Link to product features page
- Warm, professional tone

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Professional email draft
Duration: ~2-3 minutes
Checkpoints: 1 (review email before sending)

4. Financial analysis

  • Email
  • API
To: agent@mixus.com
Subject: Eval: ROI Calculation

Calculate ROI for marketing campaign with:
- Total spend: $50,000
- Revenue generated: $175,000
- Campaign duration: 3 months

Show ROI % and monthly breakdown.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: 250% ROI with monthly breakdown
Duration: ~2 minutes
Checkpoints: 1 (verify calculations)

5. Multi-step workflow

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Quarterly Report

1. Search for our Q3 sales data
2. Calculate total revenue and growth rate
3. Identify top 3 products by sales
4. Create executive summary
5. Email summary to team@company.com

Checkpoints:
1. Verify data accuracy before analysis
2. Review summary before sending

Test type: with-verification
Reviewer: your-email@company.com
Expected result: Email sent with quarterly summary
Duration: ~8-12 minutes
Checkpoints: 2 (data verification, email approval)

6. Data analysis

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Customer Satisfaction Report

Create customer satisfaction report including:
- Survey response rate
- Average satisfaction score
- Top 3 positive themes
- Top 3 improvement areas
- Recommendations

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Comprehensive satisfaction report
Duration: ~5-8 minutes
Checkpoints: 1 (review analysis)

7. Baseline speed test

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Quick Math Baseline

Calculate:
1. 15% of $50,000
2. 23% of $75,000
3. 8.5% of $120,000

Test type: without-verification
Reviewer: your-email@company.com
Expected result: Three calculations
Duration: ~1 minute
Checkpoints: 0 (baseline test)

8. Research with external tools

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Crypto Portfolio Value

Research current Bitcoin and Ethereum prices from CoinMarketCap.
Then calculate portfolio value for:
- 0.5 BTC
- 10 ETH

Show individual values and total.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: Current portfolio value
Duration: ~3 minutes
Checkpoints: 1 (verify prices and calculations)

9. Complex calculation with context

  • Email
  • API
To: agent@mixus.com
Subject: Eval: R&D Tax Credit

Calculate R&D tax credit using Alternative Simplified Credit method.

Data:
- Current year QRE: $120,000
- Prior 3-year average: $90,000
- Formula: 14% × (current_qre - base_amount)
- Base amount = 50% of prior average

Checkpoints:
1. Verify base amount is $45,000
2. Verify final credit is $10,500

Test type: with-verification
Reviewer: your-email@company.com
Expected result: $10,500 tax credit
Duration: ~3-4 minutes
Checkpoints: 2 (base amount, final credit)

10. Integration task

  • Email
  • API
To: agent@mixus.com
Subject: Eval: Calendar Scheduling

Find 3 available 30-minute slots next week when both 
john@company.com and sarah@company.com are free. Check their 
Google calendars.

Available hours: 9am-5pm EST
Prefer afternoon slots

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
Expected result: 3 available time slots
Duration: ~4-6 minutes
Checkpoints: 1 (approve proposed times)

Batch testing example

Submit multiple tasks at once:
#!/bin/bash

API_KEY="mxs_eval_YOUR_KEY"
REVIEWER="your-email@company.com"

# Array of tasks
tasks=(
  '{"taskName":"Test 1","taskDescription":"Calculate 15% of $50,000"}'
  '{"taskName":"Test 2","taskDescription":"Calculate 23% of $75,000"}'
  '{"taskName":"Test 3","taskDescription":"Calculate 8.5% of $120,000"}'
)

# Submit all tasks
for task in "${tasks[@]}"; do
  echo "Submitting: $task"
  
  curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      $(echo $task | jq -r '. + {autoDetectCheckpoints:true, testMode:"with-verification", assignedReviewer:"'$REVIEWER'"}')
    }"
  
  echo ""
done

Comparison testing

Run same task with and without human verification:

With verification

curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "taskName": "Commission Calc - With Human",
    "taskDescription": "Calculate 15% commission on $50,000 sale",
    "autoDetectCheckpoints": true,
    "testMode": "with-verification",
    "assignedReviewer": "your-email@company.com",
    "externalId": "test-with-human"
  }'

Without verification (baseline)

curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "taskName": "Commission Calc - Baseline",
    "taskDescription": "Calculate 15% commission on $50,000 sale",
    "testMode": "without-verification",
    "assignedReviewer": "your-email@company.com",
    "externalId": "test-baseline"
  }'
Compare results in dashboard at app.mixus.ai/eval

Next steps

I