AI Inference API — 140+ Models | VoltageGPU

Popular Models and Pricing

  • DeepSeek-R1 — $0.46/M input tokens, $1.85/M output tokens, reasoning model
  • Qwen3-32B — $0.15/M input tokens, $0.44/M output tokens, multilingual LLM
  • Llama 3.3 70B — $0.35/M input tokens, $0.40/M output tokens, Meta open-source LLM
  • GLM-4-9B — free, lightweight chat model
  • FLUX Schnell — $0.003/image, fast image generation
  • Mistral, Gemma, and 130+ more models available

Features

  • OpenAI-compatible API — drop-in replacement, change one line of code
  • Streaming responses — real-time token streaming for chat applications
  • Function calling — tool use and structured output support
  • Embeddings — text embedding models for RAG and search
  • Image generation — FLUX, SDXL, and other diffusion models

Code Example

curl https://api.voltagegpu.com/v1/chat/completions -H "Authorization: Bearer YOUR_KEY" -H "Content-Type: application/json" -d {"model": "deepseek-ai/DeepSeek-R1", "messages": [{"role": "user", "content": "Hello"}]}

Cost Comparison

VoltageGPU AI inference is 2-10x cheaper than OpenAI with the same API format. DeepSeek R1 at $0.46/M vs OpenAI GPT-4 at $10/M input tokens. Same OpenAI SDK, same endpoints, fraction of the cost.

AI Inference API

Run AI Models
Serverless & OpenAI-Compatible

144+ models including DeepSeek R1, Qwen3, Llama 3, FLUX. Drop-in OpenAI replacement. Pay per token, no GPU management.

models
From $0.02/M tokens
OpenAI compatible
$5 free credit

Available Models

Prices update in real-time. Click any model to start using it.

OpenAI-Compatible API

Switch from OpenAI in one line. Change the base_url, keep your code.

Python (OpenAI SDK)
from openai import OpenAI client = OpenAI( base_url="https://api.voltagegpu.com/v1", api_key="vgpu_sk_xxxxxxxx" ) response = client.chat.completions.create( model="deepseek-ai/DeepSeek-R1", messages=[{"role": "user", "content": "Explain quantum computing"}] ) print(response.choices[0].message.content)
cURL
curl https://api.voltagegpu.com/v1/chat/completions \ -H "Authorization: Bearer vgpu_sk_xxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen3-32B", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 1024 }'

Why VoltageGPU Inference

Up to 95% Cheaper
DeepSeek R1 at $0.46/M vs OpenAI GPT-4 at $10/M. Same quality, fraction of the cost.
Zero Infrastructure
Fully serverless. No GPU management, no cold starts, auto-scaling to any demand.
OpenAI Drop-in
Change one line of code. All /v1/ endpoints supported: chat, completions, embeddings, images.
144+ Models
LLMs, image generators, video, embeddings. DeepSeek, Qwen, Llama, Mistral, FLUX and more.
Pay Per Token
No subscriptions, no minimums. Pay exactly for what you use, billed per token or per image.
Enterprise Ready
99.9% uptime SLA. SOC 2 compliance. Rate limiting, usage analytics, team management.

Start in 60 Seconds

From sign-up to first API call in under a minute.

1
Create Account
Sign up free and get $5 credit instantly. No credit card required.
2
Get API Key
Generate your API key from the dashboard. One key for all models.
3
Call the API
Use the OpenAI SDK or any HTTP client. Point to our endpoint and go.

FAQ

How much does it cost?

From $0.02/M tokens for lightweight models. DeepSeek R1 at $0.46/M input. Images from $0.003 each. Pay per use, no minimums.

Is it compatible with OpenAI?

Yes. Change base_url to api.voltagegpu.com/v1 and use your VoltageGPU API key. All /v1/ endpoints are supported.

Do I need to manage GPUs?

No. Fully serverless — we handle all infrastructure, scaling, and GPU allocation. You just call the API.

What models are available?

144+ models: DeepSeek R1, Qwen3 (32B/235B), Llama 3, Mistral, FLUX Schnell, Gemma, and many specialized models.

How fast is the response?

Sub-second first token for most models. No cold starts. Models are always warm and ready to serve.

Can I use it for production?

Yes. 99.9% uptime SLA, rate limiting, usage analytics. Used by startups and enterprises in production.

Start Using AI Inference

$5 free credit. No credit card required. 144+ models ready.

OpenAI compatiblePer-token billingBitcoin accepted99.9% uptime