How Tokens Work

A comprehensive guide to understanding token counting, consumption, and optimization in mixus.

Token Fundamentals

What Exactly is a Token?

Tokens are the basic units that AI models use to process and understand text. Think of them as puzzle pieces that models use to break down and analyze language:

Not exactly words: A token can be a word, part of a word, or even punctuation
Model-specific: Different AI models tokenize text differently
Language-dependent: English, Chinese, and code each tokenize uniquely

Tokenization Examples

English Text:

"Hello, world!" = ["Hello", ",", " world", "!"] = 4 tokens
"AI-powered" = ["AI", "-", "powered"] = 3 tokens
"The quick brown fox" = ["The", " quick", " brown", " fox"] = 4 tokens
```text

**Code:**
```python
def hello():
    print("Hello")
```text
```text
= ["def", " hello", "()", ":", "\n", "    ", "print", "(", "\"", "Hello", "\"", ")"] = 12 tokens
```text

**Complex Text:**
```text
"GPT-4 is amazing!" = ["G", "PT", "-", "4", " is", " amazing", "!"] = 7 tokens
```text

## Token Consumption in mixus

### Input Token Calculation

Every time you interact with an AI model, input tokens include:

1. **Your message**: The text you type
2. **Conversation history**: Previous messages in the chat
3. **System prompts**: Internal instructions to the model
4. **File content**: Uploaded documents and images
5. **Memory context**: Relevant memories from your knowledge base

**Example Calculation:**
```text
Your message: "Analyze this document" = 4 tokens
Previous conversation: 150 tokens
System prompt: 50 tokens
Document content: 2,000 tokens
Total input tokens: 2,204 tokens
```text

### Output Token Calculation

Output tokens are generated by the AI model:

1. **Direct responses**: The text the model writes back
2. **Tool outputs**: Results from web search, code execution, etc.
3. **Reasoning tokens**: Internal thinking (for models like o1)

**Example:**
```text
Model response: 500 tokens
Web search results: 200 tokens
Total output tokens: 700 tokens
```text

### Total Interaction Cost

```text
Total cost = (Input tokens × Input rate) + (Output tokens × Output rate)
```text

## Model-Specific Token Rates

### GPT-4o
- **Input**: $0.0025 per 1K tokens
- **Output**: $0.01 per 1K tokens
- **Context**: 128K token limit

### GPT-4o mini
- **Input**: $0.00015 per 1K tokens
- **Output**: $0.0006 per 1K tokens
- **Context**: 128K token limit

### Claude 4 Sonnet  
- **Input**: $0.003 per 1K tokens (≤200K), $0.006 per 1K tokens (>200K)
- **Output**: $0.015 per 1K tokens (≤200K), $0.0225 per 1K tokens (>200K)
- **Context**: 1M token limit

### Claude 3.5 Sonnet
- **Input**: $0.003 per 1K tokens
- **Output**: $0.015 per 1K tokens
- **Context**: 200K token limit

### o1-preview
- **Input**: $0.015 per 1K tokens
- **Output**: $0.06 per 1K tokens
- **Reasoning**: Additional cost for thinking tokens

## Context Window Management

### Understanding Context Limits

Each model has a maximum context window:

- **GPT-4o**: 128,000 tokens
- **Claude 3.5 Sonnet**: 200,000 tokens
- **o1-preview**: 128,000 tokens

When you exceed this limit, mixus automatically:
1. Removes oldest conversation history
2. Preserves system prompts and recent messages
3. Maintains uploaded file content when possible

### Context Optimization Strategies

**1. Message Management**
```text
✅ Keep recent, relevant messages
❌ Long conversations with repetitive content
```text

**2. File Upload Strategy**
```text
✅ Upload only necessary sections
❌ Upload entire large documents unnecessarily
```text

**3. Memory Usage**
```text
✅ Save important information to memory
❌ Rely solely on conversation history
```text

## Token Optimization Techniques

### Prompt Engineering

**Inefficient Prompt:**
```text
"Can you please help me understand how to create a function in Python that takes two numbers as input parameters and returns their sum? I would appreciate if you could provide a detailed explanation with comments."
```text
*Token count: ~35 tokens*

**Optimized Prompt:**
```text
"Create a Python function that adds two numbers. Include comments."
```text
*Token count: ~12 tokens*

### Batch Processing

**Inefficient:**
```text
Three separate requests:
1. "Analyze document1.pdf"
2. "Analyze document2.pdf"  
3. "Analyze document3.pdf"
```text

**Efficient:**
```text
One request:
"Analyze these three documents and compare their key findings."
```text

### Smart Context Management

**Use Memory System:**
- Save important findings to memory
- Reference memories instead of re-uploading files
- Clear conversation history when starting new topics

**File Processing:**
- Extract only relevant sections
- Summarize large documents before analysis
- Use document chunking for very large files

## Real-Time Token Tracking

### Usage Monitoring

mixus provides real-time token tracking:

1. **Per-message costs**: See tokens used for each interaction
2. **Daily summaries**: Track your daily token consumption
3. **Monthly analytics**: Understand usage patterns
4. **Model breakdowns**: See which models consume most tokens

### Cost Alerts

Set up automatic alerts:
- **Budget thresholds**: Get notified at 80% of monthly budget
- **Spike detection**: Alert on unusual usage patterns
- **Model-specific limits**: Set per-model usage caps

## Advanced Token Concepts

### Reasoning Tokens (o1 Models)

The o1 family includes "reasoning tokens" - internal thinking:

```text
Your prompt: 10 tokens
Model reasoning: 2,000 tokens (internal)
Model response: 100 tokens
Total billable: 2,110 tokens
```text

### Tool Execution Tokens

When using AI tools:

```text
Web search query: 5 tokens
Search results: 300 tokens
Tool processing: 50 tokens
Response integration: 100 tokens
Total tool tokens: 455 tokens
```text

### Multi-Modal Token Costs

Images and documents have special token calculations:

**Images:**
- Small images: ~85 tokens
- Medium images: ~170 tokens
- Large images: ~255 tokens

**Documents:**
- Text extraction: 1 token per ~4 characters
- OCR processing: Additional 20% token overhead
- Analysis prompts: Model-specific system tokens

## Troubleshooting Token Issues

### Common Problems

**1. Unexpected High Usage**
- Check for large file uploads
- Review conversation history length
- Verify memory system usage

**2. Context Limit Errors**
- Reduce file sizes
- Clear conversation history
- Use document summarization

**3. Cost Optimization**
- Switch to more efficient models
- Improve prompt engineering
- Use batch processing

### Best Practices Checklist

- [ ] Use appropriate model for task complexity
- [ ] Keep prompts concise but clear
- [ ] Regularly clear conversation history
- [ ] Upload only necessary file content
- [ ] Monitor daily token usage
- [ ] Set budget alerts
- [ ] Use memory system effectively
- [ ] Batch similar requests

## Next Steps

- [Track your usage in real-time](/tokens/tracking)
- [Understand billing and pricing](/tokens/billing)
- [Learn about model limits](/ai-models/limits)

## Related Resources

- [AI Models Overview](/ai-models/overview)
- [Rate Limits](/limits/rate-limits)
- [Files & Memory](/files-memory/overview) 

Getting Started

AI Models

Chats

AI Agents

Micro-Agent Patterns

AI Tools

Evaluation System

Integrations

Files & Memory

Model Context Protocol

Collaboration

Security & Privacy

Account Settings

Tokens & Billing

Limits & Quotas

Videos

Support

Copy Content

Legal

API Reference

How Tokens Work

Token Fundamentals

What Exactly is a Token?

Tokenization Examples

Getting Started

AI Models

Chats

AI Agents

Micro-Agent Patterns

AI Tools

Evaluation System

Integrations

Files & Memory

Model Context Protocol

Collaboration

Security & Privacy

Account Settings

Tokens & Billing

Limits & Quotas

Videos

Support

Copy Content

Legal

API Reference

​Token Fundamentals

​What Exactly is a Token?

​Tokenization Examples

Token Fundamentals

What Exactly is a Token?

Tokenization Examples