Skip to main content

Quick start

Submit an evaluation task via email or API and start testing your AI agents.
  • Email Method
  • API Method

Email submission (easiest)

Send an email to test your first evaluation:
To: agent@mixus.com
Subject: Eval: Calculate Sales Commission

Calculate 15% commission on a $50,000 sale.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com
What happens next:
  1. System creates an agent to solve the task
  2. AI automatically detects verification checkpoints
  3. Agent executes and pauses at checkpoint
  4. You receive email notification
  5. Reply “approve” to continue
  6. Get results email when complete
No API key needed - Just send an email!

Test modes explained

Choose the right mode for your evaluation:
You specify exact verification points.Use when:
  • You know exactly where verification should happen
  • Testing specific decision points
  • Task has well-defined stages
Example:
{
  "testMode": "with-verification",
  "checkpoints": [
    {
      "stage": "calculation",
      "description": "Calculate result",
      "verificationQuestion": "Is the calculation correct?"
    }
  ]
}
Agent runs autonomously without human oversight.Use when:
  • Speed benchmarking
  • Comparing with/without human
  • Low-risk tasks
Example:
{
  "testMode": "without-verification"
}

Verification workflow

When using verification mode, here’s what to expect:
1

Checkpoint reached

Agent pauses execution and sends email notification
2

Review agent's work

Open chat or check email to see what agent plans to do
3

Respond

  • Type “approve” - Agent continues
  • Type “reject” - Agent stops
  • Type “hint: [guidance]” - Agent adjusts approach
4

Agent continues

After approval, agent proceeds to next step or completes task

Example: Complete flow

Here’s a complete example from start to finish:

1. Submit via email

To: agent@mixus.com
Subject: Eval: Research Competitors

Research pricing for top 3 AI agent companies and email 
summary to team@company.com

Test type: with-verification
Auto-detect: true
Reviewer: manager@company.com

2. Confirmation received

From: agent@mixus.com
Subject: Re: Eval: Research Competitors

✅ Evaluation Started!

Task: Research Competitors
Type: with-verification
Checkpoints: 2 (AI-detected)

Review at: https://app.mixus.ai/chat/abc123

You'll receive email when verification is needed.

3. Checkpoint notification

From: agent@mixus.com
Subject: Checkpoint Verification Needed

🔔 Checkpoint 1: Research Review

I've researched pricing for OpenAI, Anthropic, and Google DeepMind.

Results:
- OpenAI: $20-$200/month
- Anthropic: Custom enterprise pricing
- Google DeepMind: Part of Google Cloud

Respond with: approve | reject | hint: [guidance]

View full details: https://app.mixus.ai/chat/abc123

4. You approve

Reply to email: approve

5. Second checkpoint

From: agent@mixus.com
Subject: Checkpoint Verification Needed

🔔 Checkpoint 2: Email Review

Ready to send email summary to team@company.com

Subject: Competitor Pricing Analysis
Content: [Shows email draft]

Approve sending this email?

6. You approve again

Reply: approve

7. Completion notification

From: agent@mixus.com
Subject: Evaluation Complete

✅ Task Complete!

Results:
- Success: Yes
- Checkpoints: 2/2 approved
- Duration: 5 minutes
- Cost: $2.50

Email sent successfully to team@company.com

View full details: https://app.mixus.ai/chat/abc123

Where to track evaluations


Tips for success

Write clear task descriptionsGood: “Calculate 15% commission on a $50,000 sale and send result via email to manager@company.comBad: “Do commission stuff”
Start with auto-detect modeLet AI determine checkpoints until you understand the system better.
Test simple tasks firstStart with calculations or research before complex multi-step workflows.
Use baseline mode for comparisonsRun same task with and without verification to measure impact.

Next steps


Common questions

No! Email submissions don’t require an API key. Just send to agent@mixus.com with subject starting “Eval:”
Depends on task complexity and verification time. Simple tasks: 2-5 minutes. Complex tasks: 10-30 minutes.
Yes! Via API you can submit multiple tasks. They’ll run in parallel.
The evaluation will wait for your response. You can respond anytime via email or chat.
Yes, reply “reject” at any checkpoint or stop it from the dashboard.

Need help?

I