API Latency Ranking

Compare API response times of major AI models. View average latency and P99 latency side by side.

Sort by:

Model Name	Provider	Avg Latency ↑	P99 Latency
Claude Haiku 4	Anthropic	120ms	350ms
Gemini 2.0 Flash	Google	150ms	400ms
GPT-4o mini	OpenAI	180ms	500ms
Llama 4 70B	Meta	280ms	800ms
Claude Sonnet 4	Anthropic	380ms	1100ms
GPT-4o	OpenAI	450ms	1200ms
Gemini 2.5 Pro	Google	520ms	1500ms
DeepSeek Chat	DeepSeek	600ms	2000ms

Claude Haiku 4Anthropic

Avg Latency

120ms

P99 Latency

350ms

Gemini 2.0 FlashGoogle

Avg Latency

150ms

P99 Latency

400ms

GPT-4o miniOpenAI

Avg Latency

180ms

P99 Latency

500ms

Llama 4 70BMeta

Avg Latency

280ms

P99 Latency

800ms

Claude Sonnet 4Anthropic

Avg Latency

380ms

P99 Latency

1100ms

GPT-4oOpenAI

Avg Latency

450ms

P99 Latency

1200ms

Gemini 2.5 ProGoogle

Avg Latency

520ms

P99 Latency

1500ms

DeepSeek ChatDeepSeek

Avg Latency

600ms

P99 Latency

2000ms

About Latency Data

Latency values shown are representative estimates of typical API response times. Actual performance varies based on request content, region, and load conditions. Average shows typical request response time, while P99 shows the 99th percentile (worst-case) response time.

Avg Latency

P99 Latency