Token Fundamentals
What Exactly is a Token?
Tokens are the basic units that AI models use to process and understand text. Think of them as puzzle pieces that models use to break down and analyze language:- Not exactly words: A token can be a word, part of a word, or even punctuation
- Model-specific: Different AI models tokenize text differently
- Language-dependent: English, Chinese, and code each tokenize uniquely
Tokenization Examples
English Text:Copy
Ask AI
"Hello, world!" = ["Hello", ",", " world", "!"] = 4 tokens
"AI-powered" = ["AI", "-", "powered"] = 3 tokens
"The quick brown fox" = ["The", " quick", " brown", " fox"] = 4 tokens
```text
**Code:**
```python
def hello():
print("Hello")
```text
```text
= ["def", " hello", "()", ":", "\n", " ", "print", "(", "\"", "Hello", "\"", ")"] = 12 tokens
```text
**Complex Text:**
```text
"GPT-4 is amazing!" = ["G", "PT", "-", "4", " is", " amazing", "!"] = 7 tokens
```text
## Token Consumption in mixus
### Input Token Calculation
Every time you interact with an AI model, input tokens include:
1. **Your message**: The text you type
2. **Conversation history**: Previous messages in the chat
3. **System prompts**: Internal instructions to the model
4. **File content**: Uploaded documents and images
5. **Memory context**: Relevant memories from your knowledge base
**Example Calculation:**
```text
Your message: "Analyze this document" = 4 tokens
Previous conversation: 150 tokens
System prompt: 50 tokens
Document content: 2,000 tokens
Total input tokens: 2,204 tokens
```text
### Output Token Calculation
Output tokens are generated by the AI model:
1. **Direct responses**: The text the model writes back
2. **Tool outputs**: Results from web search, code execution, etc.
3. **Reasoning tokens**: Internal thinking (for models like o1)
**Example:**
```text
Model response: 500 tokens
Web search results: 200 tokens
Total output tokens: 700 tokens
```text
### Total Interaction Cost
```text
Total cost = (Input tokens × Input rate) + (Output tokens × Output rate)
```text
## Model-Specific Token Rates
### GPT-4o
- **Input**: $0.0025 per 1K tokens
- **Output**: $0.01 per 1K tokens
- **Context**: 128K token limit
### GPT-4o mini
- **Input**: $0.00015 per 1K tokens
- **Output**: $0.0006 per 1K tokens
- **Context**: 128K token limit
### Claude 4 Sonnet
- **Input**: $0.003 per 1K tokens (≤200K), $0.006 per 1K tokens (>200K)
- **Output**: $0.015 per 1K tokens (≤200K), $0.0225 per 1K tokens (>200K)
- **Context**: 1M token limit
### Claude 3.5 Sonnet
- **Input**: $0.003 per 1K tokens
- **Output**: $0.015 per 1K tokens
- **Context**: 200K token limit
### o1-preview
- **Input**: $0.015 per 1K tokens
- **Output**: $0.06 per 1K tokens
- **Reasoning**: Additional cost for thinking tokens
## Context Window Management
### Understanding Context Limits
Each model has a maximum context window:
- **GPT-4o**: 128,000 tokens
- **Claude 3.5 Sonnet**: 200,000 tokens
- **o1-preview**: 128,000 tokens
When you exceed this limit, mixus automatically:
1. Removes oldest conversation history
2. Preserves system prompts and recent messages
3. Maintains uploaded file content when possible
### Context Optimization Strategies
**1. Message Management**
```text
✅ Keep recent, relevant messages
❌ Long conversations with repetitive content
```text
**2. File Upload Strategy**
```text
✅ Upload only necessary sections
❌ Upload entire large documents unnecessarily
```text
**3. Memory Usage**
```text
✅ Save important information to memory
❌ Rely solely on conversation history
```text
## Token Optimization Techniques
### Prompt Engineering
**Inefficient Prompt:**
```text
"Can you please help me understand how to create a function in Python that takes two numbers as input parameters and returns their sum? I would appreciate if you could provide a detailed explanation with comments."
```text
*Token count: ~35 tokens*
**Optimized Prompt:**
```text
"Create a Python function that adds two numbers. Include comments."
```text
*Token count: ~12 tokens*
### Batch Processing
**Inefficient:**
```text
Three separate requests:
1. "Analyze document1.pdf"
2. "Analyze document2.pdf"
3. "Analyze document3.pdf"
```text
**Efficient:**
```text
One request:
"Analyze these three documents and compare their key findings."
```text
### Smart Context Management
**Use Memory System:**
- Save important findings to memory
- Reference memories instead of re-uploading files
- Clear conversation history when starting new topics
**File Processing:**
- Extract only relevant sections
- Summarize large documents before analysis
- Use document chunking for very large files
## Real-Time Token Tracking
### Usage Monitoring
mixus provides real-time token tracking:
1. **Per-message costs**: See tokens used for each interaction
2. **Daily summaries**: Track your daily token consumption
3. **Monthly analytics**: Understand usage patterns
4. **Model breakdowns**: See which models consume most tokens
### Cost Alerts
Set up automatic alerts:
- **Budget thresholds**: Get notified at 80% of monthly budget
- **Spike detection**: Alert on unusual usage patterns
- **Model-specific limits**: Set per-model usage caps
## Advanced Token Concepts
### Reasoning Tokens (o1 Models)
The o1 family includes "reasoning tokens" - internal thinking:
```text
Your prompt: 10 tokens
Model reasoning: 2,000 tokens (internal)
Model response: 100 tokens
Total billable: 2,110 tokens
```text
### Tool Execution Tokens
When using AI tools:
```text
Web search query: 5 tokens
Search results: 300 tokens
Tool processing: 50 tokens
Response integration: 100 tokens
Total tool tokens: 455 tokens
```text
### Multi-Modal Token Costs
Images and documents have special token calculations:
**Images:**
- Small images: ~85 tokens
- Medium images: ~170 tokens
- Large images: ~255 tokens
**Documents:**
- Text extraction: 1 token per ~4 characters
- OCR processing: Additional 20% token overhead
- Analysis prompts: Model-specific system tokens
## Troubleshooting Token Issues
### Common Problems
**1. Unexpected High Usage**
- Check for large file uploads
- Review conversation history length
- Verify memory system usage
**2. Context Limit Errors**
- Reduce file sizes
- Clear conversation history
- Use document summarization
**3. Cost Optimization**
- Switch to more efficient models
- Improve prompt engineering
- Use batch processing
### Best Practices Checklist
- [ ] Use appropriate model for task complexity
- [ ] Keep prompts concise but clear
- [ ] Regularly clear conversation history
- [ ] Upload only necessary file content
- [ ] Monitor daily token usage
- [ ] Set budget alerts
- [ ] Use memory system effectively
- [ ] Batch similar requests
## Next Steps
- [Track your usage in real-time](/tokens/tracking)
- [Understand billing and pricing](/tokens/billing)
- [Learn about model limits](/ai-models/limits)
## Related Resources
- [AI Models Overview](/ai-models/overview)
- [Rate Limits](/limits/rate-limits)
- [Files & Memory](/files-memory/overview)