MULTI-MODEL INFERENCE API

Same responses. Your choice of model.

Unified API across Claude, GPT-4, Llama, and Mistral. Switch models in one line of code. Compress with theta symbols for 70% cost reduction.

View on GitHub See How It Works

Model Selector

4 MODELS

Choose your model

Monthly token volume

1M = 1 million input tokens

Claude Opus Highest quality

Monthly Cost

$300

Latency (p95)

280ms

Quality Rating

⭐⭐⭐⭐⭐

View API Docs →

Models Available

1 API

Switch in One Line

60x

Compression Ratio

The Problem

Every AI provider is different.
Your code shouldn't depend on one.

Build once. Switch models freely. Claude got too expensive? Use Llama. Need raw speed? Mistral. Theta compression makes any model 60x more efficient. Read Docs

Direct Provider APIs

Vendor lock-in

Different API formats per provider
Switching = rewrite all code
Price increases? Stuck paying it
Can't A/B test different models
No cost transparency across models

Emergent Language

Provider flexibility

Unified API across all models
Switch with one parameter change
Compare costs side-by-side
A/B test models in production
Built-in compression = 60x efficiency

How It Works

One API. Four models. Instant switching.

Write once, choose your model. Unified interface across Claude, GPT-4, Llama, and Mistral. Switch provider in one line. Getting Started

Initialize Client

Pick your default model. Switch anytime.

from emergent import Inference

client = Inference(
api_key="...",
model="claude-opus"
)

Call Any Model

Same prompt, different models. Compare quality and speed.

# Use Claude by default
response = client.chat("Hello")

# Switch to Llama for this call
response = client.chat("Hello",
model="llama-70b")

Automatic Compression

Theta symbols compress tokens by 60x. Works on every model.

client = Inference(
compression="theta"
)

# 60x smaller tokens
response = client.chat(...)

Peak Performance

Blazing Fast. Proven at Scale.

Batch processing unlocks massive throughput. 10,646 messages/sec across 3 regions. 77% bandwidth savings on every request.

3,549 msg/sec

Single Region (50-msg batches)

✓ 100% success rate
✓ <50ms p95 latency
✓ Rust-optimized

10,646 msg/sec

3 Regions (3× 3,549)

✓ Zero-copy batching
✓ Geo-distributed
✓ Scales linearly

77% Bandwidth Saved

Every Request

✓ Theta compression
✓ No codec overhead
✓ Persistent

Pro Tip: Use 50-100 message batches for optimal latency/throughput tradeoff. See batching guide for examples.

Performance Benchmark

Test it yourself. Compete globally.

Join thousands of developers stress testing our infrastructure. Real-time leaderboard. Zero credentials required. See who's pushing the limits.

🏆 Top Performers (24h)

Live

Rank

Participant

Peak RPS

p95 Lat

Success

Fetching leaderboard...

Ready to Test?

Fire unlimited requests. No auth, no rate limit. Pure performance. Join the competition and see your name on the leaderboard.

How to Test

Option 1: Python (Recommended)

                                        pip install aiohttp

# Quick test with auto-generated data

for i in {1..100}; do

  curl -X POST https://emergent-language.fly.dev/stress-test/fire &

done

# Or use Python async client

python3 examples/stress_test_client.py \

  --target https://emergent-language.fly.dev \

  --rps 500 --duration 60

Option 2: cURL with Diverse Data Types

                                        # String payload

curl -X POST https://emergent-language.fly.dev/stress-test/fire \

  -H "Content-Type: application/json" \

  -d '{"type": "string", "payload": "Agent task analysis"}' 

# JSON object

curl -X POST https://emergent-language.fly.dev/stress-test/fire \

  -H "Content-Type: application/json" \

  -d '{"type": "json", "payload": {"agent_id": "001", "action": "analyze"}}' 

# Or send nothing for auto-generated data

curl -X POST https://emergent-language.fly.dev/stress-test/fire

Option 3: Batch Processing (10,646 msg/sec!)

                                        # Batch 50 messages per request = 3,549 msg/sec

curl -X POST https://emergent-language.fly.dev/translate/rust-batch \

  -H "Content-Type: application/json" \

  -H "Authorization: Bearer YOUR_KEY" \

  -d '{

    "messages": [

      {"id": 1, "text": "message 1"},

      {"id": 2, "text": "message 2"},

      ...

      {"id": 50, "text": "message 50"}

    ]

  }'

Option 4: Watch Live Results

                                        # Open in browser:

https://emergent-language.fly.dev/stress-test/dashboard

💡 Performance Tips

💥 Use batch processing: 50-message batches = 3,549 msg/sec (vs 40 RPS single-message)
Real compression: Each request processes through actual compression pipeline
Use parallel processes: 10-50 concurrent connections scale well
Target 500 RPS: Sustained throughput wins over burst traffic
Test from different regions: Global endpoints = higher scores
Run longer sessions: 5-10 minute tests stabilize metrics

📊 Live Dashboard 📈 JSON Stats 📄 Full Client Script

Median Latency

—

99th Percentile

—

Pricing

Token-based pricing. Your choice of model.

Pass-through pricing from each provider. No markup. Pay what the model costs. Theta compression makes everything cheaper.

Free Tier

30 minutes per 12-hour window. Test all models risk-free.

All 4 models included
30 min per 12h window
No credit card
Full API access

Get Free Key

Recommended

Pay as You Go

Variable

Pay provider pricing. Theta compression saves 60% of tokens.

Claude Opus: $3/1M tokens
GPT-4: $3/1M tokens
Llama 70B: $0.50/1M tokens
Mistral: $0.15/1M tokens

Read API Docs

Enterprise

Custom

Volume discounts, dedicated support, custom routing logic.

Volume pricing negotiation
Multi-model A/B testing
Priority support
Custom integrations

Contact Sales