Getting Started

Quick start

Submit an evaluation task via email or API and start testing your AI agents.

Email Method
API Method

Email submission (easiest)

Send an email to test your first evaluation:

To: agent@mixus.com
Subject: Eval: Calculate Sales Commission

Calculate 15% commission on a $50,000 sale.

Test type: with-verification
Auto-detect: true
Reviewer: your-email@company.com

What happens next:

System creates an agent to solve the task
AI automatically detects verification checkpoints
Agent executes and pauses at checkpoint
You receive email notification
Reply “approve” to continue
Get results email when complete

No API key needed - Just send an email!

API submission (for automation)

Step 1: Get API key

Generate an API key at app.mixus.ai/integrations/api-keys with eval:create and eval:read permissions.

Step 2: Submit task

curl -X POST https://app.mixus.ai/api/eval/create-task-agent \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "taskName": "Calculate Sales Commission",
    "taskDescription": "Calculate 15% commission on a $50,000 sale",
    "autoDetectCheckpoints": true,
    "testMode": "with-verification",
    "assignedReviewer": "your-email@company.com"
  }'

Step 3: Track progress

# Use executionId from response
curl https://app.mixus.ai/api/eval/status/EXECUTION_ID \
  -H "Authorization: Bearer mxs_eval_YOUR_KEY"

Test modes explained

Choose the right mode for your evaluation:

With Verification + Auto-detect (Recommended)

AI automatically determines where verification is needed.Use when:

You don’t know exact verification points
Task involves external actions (emails, purchases)
You want AI to identify critical steps

Example:

{
  "testMode": "with-verification",
  "autoDetectCheckpoints": true
}

With Verification + Manual Checkpoints

You specify exact verification points.Use when:

You know exactly where verification should happen
Testing specific decision points
Task has well-defined stages

Example:

{
  "testMode": "with-verification",
  "checkpoints": [
    {
      "stage": "calculation",
      "description": "Calculate result",
      "verificationQuestion": "Is the calculation correct?"
    }
  ]
}

Without Verification (Baseline)

Agent runs autonomously without human oversight.Use when:

Speed benchmarking
Comparing with/without human
Low-risk tasks

Example:

{
  "testMode": "without-verification"
}

Verification workflow

When using verification mode, here’s what to expect:

Checkpoint reached

Agent pauses execution and sends email notification

Review agent's work

Open chat or check email to see what agent plans to do

Respond

Type “approve” - Agent continues - Type “reject” - Agent stops - Type “hint: [guidance]” - Agent adjusts approach

Agent continues

After approval, agent proceeds to next step or completes task

Example: Complete flow

Here’s a complete example from start to finish:

1. Submit via email

To: agent@mixus.com
Subject: Eval: Research Competitors

Research pricing for top 3 AI agent companies and email
summary to team@company.com

Test type: with-verification
Auto-detect: true
Reviewer: manager@company.com

2. Confirmation received

From: agent@mixus.com
Subject: Re: Eval: Research Competitors

✅ Evaluation Started!

Task: Research Competitors
Type: with-verification
Checkpoints: 2 (AI-detected)

Review at: https://app.mixus.ai/chat/abc123

You'll receive email when verification is needed.

3. Checkpoint notification

From: agent@mixus.com
Subject: Checkpoint Verification Needed

🔔 Checkpoint 1: Research Review

I've researched pricing for OpenAI, Anthropic, and Google DeepMind.

Results:
- OpenAI: $20-$200/month
- Anthropic: Custom enterprise pricing
- Google DeepMind: Part of Google Cloud

Respond with: approve | reject | hint: [guidance]

View full details: https://app.mixus.ai/chat/abc123

4. You approve

Reply to email: approve

5. Second checkpoint

From: agent@mixus.com
Subject: Checkpoint Verification Needed

🔔 Checkpoint 2: Email Review

Ready to send email summary to team@company.com

Subject: Competitor Pricing Analysis
Content: [Shows email draft]

Approve sending this email?

6. You approve again

Reply: approve

7. Completion notification

From: agent@mixus.com
Subject: Evaluation Complete

✅ Task Complete!

Results:
- Success: Yes
- Checkpoints: 2/2 approved
- Duration: 5 minutes
- Cost: $2.50

Email sent successfully to team@company.com

View full details: https://app.mixus.ai/chat/abc123

Where to track evaluations

Dashboard

Visual overview of all evaluations

Chat Interface

See full conversation and agent work

API Status

Programmatic status checks

Email Notifications

Receive updates via email

Tips for success

Write clear task descriptionsGood: “Calculate 15% commission on a $50,000 sale and send result via email to manager@company.com”Bad: “Do commission stuff”

Start with auto-detect modeLet AI determine checkpoints until you understand the system better.

Test simple tasks firstStart with calculations or research before complex multi-step workflows.

Use baseline mode for comparisonsRun same task with and without verification to measure impact.

Next steps

Task Preparation

Learn how to write effective evaluation tasks

Examples

See 10+ ready-to-use example tasks

API Reference

Complete API documentation

Best Practices

Tips for getting the most from evaluations

Common questions

Do I need an API key for email submissions?

No! Email submissions don’t require an API key. Just send to agent@mixus.com with subject starting “Eval:”

How long does an evaluation take?

Depends on task complexity and verification time. Simple tasks: 2-5 minutes. Complex tasks: 10-30 minutes.

Can I submit multiple tasks at once?

Yes! Via API you can submit multiple tasks. They’ll run in parallel.

What if I don't respond to a checkpoint?

The evaluation will wait for your response. You can respond anytime via email or chat.

Can I cancel a running evaluation?

Yes, reply “reject” at any checkpoint or stop it from the dashboard.

Need help?

Email: support@mixus.ai
Dashboard: app.mixus.ai/eval
API Keys: app.mixus.ai/integrations/api-keys

Getting Started

Chats

AI Models

AI Agents

AI Tools

Files & Memory

Integrations

Legal AI

Model Context Protocol

Collaboration

Evaluation System

Micro-Agent Patterns

Organization Management

Enterprise

Account Settings

Tokens & Billing

Limits & Quotas

Security & Privacy

Videos

Support

Legal

API Reference

​Quick start

​Email submission (easiest)

​API submission (for automation)

​Step 1: Get API key

​Step 2: Submit task

​Step 3: Track progress

​Test modes explained

​Verification workflow

​Example: Complete flow

​1. Submit via email

​2. Confirmation received

​3. Checkpoint notification

​4. You approve

​5. Second checkpoint

​6. You approve again

​7. Completion notification

​Where to track evaluations

Dashboard

Chat Interface

API Status

Email Notifications

​Tips for success

​Next steps

Task Preparation

Examples

API Reference

Best Practices

​Common questions

​Need help?

Quick start

Email submission (easiest)

API submission (for automation)

Step 1: Get API key

Step 2: Submit task

Step 3: Track progress

Test modes explained

Verification workflow

Example: Complete flow

1. Submit via email

2. Confirmation received

3. Checkpoint notification

4. You approve

5. Second checkpoint

6. You approve again

7. Completion notification

Where to track evaluations

Tips for success

Next steps

Common questions

Need help?