Methodology

Testing Methodology

How we measure and monitor LLM latency across providers

Overview

Martian Status performs comprehensive latency testing on various LLM providers to help you understand real-world performance characteristics. Our testing methodology is designed to simulate different usage patterns and provide actionable insights.

Every 10 minTest frequency
6+ ProvidersCoverage
4 Test TypesScenarios
Stream + Non-streamModes
Test Types
Four different test scenarios to simulate various usage patterns
Short-ShortQuick interactions

Short input prompt (~100 chars) → Short output (256 chars). Simulates quick Q&A, simple commands, or brief interactions.

Short-LongGeneration tasks

Short input prompt (~100 chars) → Long output (6,400 chars). Simulates content generation, story writing, or detailed explanations.

Long-ShortContext processing

Long input prompt (~3,000 chars) → Short output (256 chars). Simulates summarization, extraction, or analysis tasks.

Long-LongComplex tasks

Long input prompt (~3,000 chars) → Long output (6,400 chars). Simulates document processing, code generation, or comprehensive analysis.

Test Implementation
How we execute and measure latency tests

Client Types

HTTP

Direct HTTP Requests

Raw HTTP POST requests to provider endpoints. Tests both /chat/completions (OpenAI-compatible) and /messages (Anthropic) endpoints.

SDK

Official SDKs

Uses official OpenAI and Anthropic SDKs to test through their native client libraries.

Streaming Modes

Stream

Streaming Response

Measures time to receive the complete streamed response. Simulates real-time applications where tokens are processed as they arrive.

Non-stream

Batch Response

Measures time to receive the complete response in one batch. Simulates batch processing or when streaming isn't needed.

Latency Measurement

Latency is measured from request initiation to response completion:

// Start timer
const startTime = Date.now();
// Make API call
const response = await makeRequest(...);
// For streaming: consume entire stream
await consumeStream(response);
// Calculate latency
const latency = Date.now() - startTime;
Providers & Routes
Testing across multiple providers and routing configurations

Direct Providers

OpenAIapi.openai.com
Anthropicapi.anthropic.com
OpenRouteropenrouter.ai

Martian Proxies

Martian Devdev.withmartian.com
Martian Prodapi.withmartian.com
Enterprise RoutesDev + Prod
Test Execution & Monitoring

Automated Testing

Tests run automatically every 10 minutes via scheduled cron jobs. Each test cycle executes all configured model/provider/client/stream/test-type combinations in parallel to minimize execution time and capture a snapshot of performance across all configurations.

Error Handling

Tests capture and categorize different failure modes:

DownService errors, timeouts, or invalid responses
InfoExpected limitations (e.g., streaming requirements)
UpSuccessful completion with valid response

Data Storage

Results are stored in a PostgreSQL database with automatic cleanup of data older than 30 days. This ensures we maintain relevant performance history while managing storage efficiently.