Advertisement
Compare API response times of major AI models. View average latency and P99 latency side by side.
| Model Name | Provider | Avg Latency ↑ | P99 Latency |
|---|---|---|---|
Claude Haiku 4 | Anthropic | 120ms | 350ms |
Gemini 2.0 Flash | 150ms | 400ms | |
GPT-4o mini | OpenAI | 180ms | 500ms |
Llama 4 70B | Meta | 280ms | 800ms |
Claude Sonnet 4 | Anthropic | 380ms | 1100ms |
GPT-4o | OpenAI | 450ms | 1200ms |
Gemini 2.5 Pro | 520ms | 1500ms | |
DeepSeek Chat | DeepSeek | 600ms | 2000ms |
Latency values shown are representative estimates of typical API response times. Actual performance varies based on request content, region, and load conditions. Average shows typical request response time, while P99 shows the 99th percentile (worst-case) response time.