Skip to main content

Writing effective tasks

Good task descriptions lead to better results. Follow these guidelines:

Be specific and clear

Calculate 15% commission on a $50,000 sale and send the result 
via email to [email protected] with subject "Q4 Commission Report"

Include expected outcomes

Research the current Bitcoin price from CoinMarketCap and calculate 
the value of 0.5 BTC. Expected result should be around $20,000-$25,000.

Provide necessary context

Calculate R&D tax credit using Alternative Simplified Credit method 
(IRS Form 6765 Section B). Current year QRE: $120,000. Prior 3-year 
average: $90,000. Formula: 14% × (current_qre - base_amount)

Email format templates

To: [email protected]
Subject: Eval: [Your Task Name]

[Clear description of what you want done]

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]
Example:
To: [email protected]
Subject: Eval: Market Research

Research the top 3 AI agent platforms by market share.
For each, find: pricing model, key features, and target market.

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]

Template 2: Manual checkpoints

To: [email protected]
Subject: Eval: [Your Task Name]

[Task description]

Checkpoints:
1. [First verification point]
2. [Second verification point]
3. [Third verification point]

Test type: with-verification
Reviewer: [email protected]
Example:
To: [email protected]
Subject: Eval: Financial Calculation

Calculate the R&D tax credit for $120,000 in qualified expenses.

Checkpoints:
1. Verify the base amount calculation is correct
2. Verify the final tax credit amount
3. Confirm the IRS form is properly filled

Test type: with-verification
Reviewer: [email protected]

Template 3: Baseline (no verification)

To: [email protected]
Subject: Eval: [Your Task Name]

[Task description]

Test type: without-verification
Reviewer: [email protected]
Example:
To: [email protected]
Subject: Eval: Simple Calculation

Calculate 15% of $50,000

Test type: without-verification
Reviewer: [email protected]

API format templates

{
  "taskName": "Market Research",
  "taskDescription": "Research top 3 AI agent platforms, find pricing, features, and target market",
  "autoDetectCheckpoints": true,
  "testMode": "with-verification",
  "assignedReviewer": "[email protected]"
}

Template 2: Manual checkpoints

{
  "taskName": "Financial Calculation",
  "taskDescription": "Calculate R&D tax credit for $120,000 QRE",
  "checkpoints": [
    {
      "stage": "base_calculation",
      "description": "Calculate base amount",
      "verificationQuestion": "Is the base amount $45,000?"
    },
    {
      "stage": "final_calculation",
      "description": "Calculate final tax credit",
      "verificationQuestion": "Is the tax credit $10,500?"
    }
  ],
  "testMode": "with-verification",
  "assignedReviewer": "[email protected]"
}

Template 3: With webhook callback

{
  "taskName": "Automated Test",
  "taskDescription": "Calculate 15% commission on $50,000 sale",
  "autoDetectCheckpoints": true,
  "testMode": "with-verification",
  "assignedReviewer": "[email protected]",
  "webhookUrl": "https://your-system.com/webhook/eval-complete",
  "externalId": "test-001"
}

Task categories and examples

Financial calculations

Subject: Eval: Sales Commission Q4

Calculate commission for Q4 sales:

- Base salary: $60,000/year
- Commission rate: 3% of sales over $100,000
- Q4 sales: $450,000

Calculate total Q4 compensation.

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]


Subject: Eval: Marketing ROI

Calculate ROI for Q3 marketing campaign:

- Total spend: $75,000
- Revenue generated: $225,000
- Campaign duration: 3 months

Provide ROI percentage and monthly breakdown.

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]

Research tasks


Subject: Eval: Competitor Research

Research OpenAI ChatGPT, Anthropic Claude, and Google Gemini.
For each, find:

1. Pricing (all tiers)
2. Key features
3. API availability
4. Target market

Create comparison table.

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]

Communication tasks


Subject: Eval: Customer Email Draft

Draft follow-up email to customer who requested product demo.
Include:

- Thank them for interest
- Propose 3 time slots next week (9am, 2pm, 4pm EST)
- Link to product features page
- Professional, warm tone

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]


Subject: Eval: Meeting Summary Email

Create meeting summary email for quarterly review:

- Key decisions made
- Action items with owners
- Next meeting date
- Send to: [email protected]

Test type: with-verification
Auto-detect: true
Reviewer: [email protected]


Best practices

Do’s ✅

Be specific about numbers and amounts

Good: "Calculate 15% commission on $50,000"
Bad: "Calculate some commission"

Include verification criteria

Good: "Expected result should be $7,500 (15% of $50,000)"
Bad: "Calculate it"

Specify data sources when relevant

Good: "Get Bitcoin price from CoinMarketCap"
Bad: "Get crypto price"

Break complex tasks into steps

Good: "1) Research competitors 2) Analyze pricing 3) Create table 4) Email summary"
Bad: "Do competitor analysis"

Don’ts ❌

Don’t use vague language

Bad: "Do some research about AI"
Good: "Research top 3 AI companies: OpenAI, Anthropic, Google DeepMind"

Don’t forget to specify reviewer

Bad: Missing "Reviewer:" field
Good: "Reviewer: [email protected]"

Don’t mix manual checkpoints with auto-detect

Bad: Both "Checkpoints: 1, 2, 3" AND "Auto-detect: true"
Good: Choose ONE approach


Checkpoint placement guide

AI auto-detects checkpoints before:

External Communications

  • Sending emails
  • Posting to social media
  • Sending SMS/messages

Financial Actions

  • Making purchases
  • Transferring money
  • Processing payments

Irreversible Operations

  • Deleting data
  • Canceling subscriptions
  • Submitting forms

High-Value Decisions

  • Strategic recommendations
  • Large calculations
  • Important approvals

AI does NOT checkpoint for:

  • Web searches and research
  • Reading files or data
  • Analysis and calculations (unless result triggers action)
  • Information gathering
  • Report generation (unless sending/publishing)

Field reference

Required fields

FieldDescriptionExample
taskName / SubjectTask identifier”Calculate Commission”
taskDescription / BodyWhat to do”Calculate 15% of $50,000”
testMode / Test typeVerification mode”with-verification”
assignedReviewer / ReviewerWho verifies[email protected]

Optional fields

FieldDescriptionExample
autoDetectCheckpoints / Auto-detectLet AI find checkpointstrue
checkpoints / CheckpointsManual verification pointsSee checkpoint format below
webhookUrlCallback URLhttps://your-app.com/webhook
externalIdYour tracking ID”task-001”

Common mistakes

Wrong: Subject: Calculate Commission Right: Subject: Eval: Calculate Commission
Wrong: Do some calculations Right: Calculate 15% commission on a $50,000 sale
Wrong: No reviewer specified Right: Reviewer: [email protected]
Wrong: 10 checkpoints for simple task Right: 2-4 checkpoints for most tasks, or use auto-detect

Next steps