What is VoltageGPU Confidential Chat?

A ChatGPT-style interface running inside Intel TDX hardware enclaves. Prompts and responses are encrypted in CPU memory at runtime and never leave the sealed Trust Domain. VoltageGPU cannot read user data, cannot log it, cannot train on it.

Do I need an account to try it?

No. Anonymous visitors get 3 free messages on Qwen3-32B-TEE without any signup, no credit card required. After the free trial, signup unlocks 50 free messages per month, Plus $20/mo unlocks 2000 messages with Qwen3-235B 262K context.

How is it different from ChatGPT Plus?

Same $20/mo price point but sealed in Intel TDX hardware. OpenAI trains on user data by default. VoltageGPU Plus cannot technically read user data — memory is encrypted with CPU-fused keys and even the hypervisor is outside the trust boundary. EU company (France, SIREN 943 808 824), native GDPR Article 28 DPA available.

Which TEE models are available?

3 models sealed in Intel TDX: Qwen3-32B-TEE (40K context, fast), Qwen3-235B-A22B-Instruct-2507-TEE (262K context, flagship, Plus/Pro), DeepSeek-R1-0528-TEE (163K context, reasoning model, Enterprise).

Can I use it for regulated work (legal, medical, finance)?

Yes. Beyond the generic chat, 8 specialized agents are available: Contract Analyst, Financial Analyst, Medical Records, Compliance Officer, Due Diligence, Cybersecurity, HR, Tax. GDPR Article 28, HIPAA, DORA, NIS2 ready. DPA available on request.

Yes — api.voltagegpu.com/v1 is OpenAI-compatible, pay-per-token starting at $0.15/M input tokens on Qwen3-32B-TEE. Drop-in replacement for the OpenAI SDK (change base_url).

Do I have to rewrite my OpenAI SDK code?

No. Change three lines: install the standard openai package (no fork), set base_url to https://app.voltagegpu.com/v1, and pass a VoltageGPU API key. Every existing call to client.chat.completions.create(), client.embeddings.create() and the streaming iterator works unchanged. The wire protocol is OpenAI-compatible.

Which OpenAI SDK features are supported?

Chat completions, streaming, tool calling, structured outputs (JSON mode), embeddings and the async client are all supported on -TEE models. Vision is supported on multimodal -TEE models. Fine-tuning, batch and the assistants API are not yet on the confidential endpoint — track the changelog. Every supported feature is exercised by the same OpenAI Python and Node SDKs you already use.

Why route OpenAI SDK calls through VoltageGPU instead of api.openai.com?

Three reasons. (1) Jurisdiction: VoltageGPU is operated by VOLTAGE EI in France — no Schrems II transfer mechanism, no FISA 702 / CLOUD Act exposure for EU controllers. (2) Hardware sealing: every -TEE model runs inside Intel TDX with AES-256 memory encryption and ECDSA-signed attestation reports. (3) Training-data isolation: zero retention, zero training reuse, native RGPD Article 28 DPA. OpenAI is a fine API for non-regulated workloads — VoltageGPU is the option when prompts contain privileged, personal, or contractually-restricted data.

Which models are available on the OpenAI-compatible endpoint?

Only models with the -TEE suffix are exposed on the confidential endpoint. Today that includes Qwen3-32B-TEE for fast multilingual workloads, Qwen3-235B-A22B-Instruct-2507-TEE for long-context drafting, DeepSeek-V3-TEE for general purpose, DeepSeek-R1-0528-TEE for reasoning chains, and Llama-3.3-70B-Instruct-TEE for English-heavy tasks. Each model name is what you pass to the model parameter on the OpenAI client.

How is latency compared to api.openai.com?

Time-to-first-token is typically in the 350-650 ms range from EU caller locations on Qwen3-32B-TEE and Llama-3.3-70B-TEE. Long-context flagship models add ~150-250 ms of TTFT. For EU clients calling US OpenAI infrastructure, network hops alone often add 80-180 ms — VoltageGPU is usually faster for EU-resident traffic on equivalent model classes.

How do API keys and rate limits work?

API keys are issued at https://app.voltagegpu.com/settings/api-keys, prefixed vg- and scoped per project. Default limits are 600 RPM and 1.2M TPM on Plus plans, 2,400 RPM and 6M TPM on Pro, and bespoke on Enterprise. Standard 429 backoff applies — the official OpenAI SDK retry helpers respect Retry-After.

Plus plan starts at $20/month with $5 of included tokens, ideal for individual developers. Starter at $349/month suits small engineering teams. Pro at $1,199/month is intended for production agent fleets up to 10 seats. Enterprise contracts (SSO/SCIM, dedicated capacity, custom DPA) start at $5,000/month. Per-token rates match the standard inference catalog — no per-call surcharge for using the confidential endpoint.

Can I prove an inference actually ran inside Intel TDX?

Yes. Per-request attestation can be enabled — each completion is paired with an ECDSA-signed report from the TDX module identifying the trust domain, base model digest and runtime version. Reports are verifiable against Intel\u2019s root keys and can be archived alongside completions for AI Act Article 12 logging or audit trails.

OpenAI SDK · Private Endpoint

Sealed in Intel TDX

OPENAI-COMPATIBLE · DROP-IN · CONFIDENTIAL

OpenAI SDK on a confidential,
EU-controlled endpoint.

Same from openai import OpenAI. Same client.chat.completions.create(). One line changes — base_url now points at https://app.voltagegpu.com/v1, and every token is sealed inside Intel TDX hardware enclaves we operate in the EU.

No fork. No proprietary client. No code rewrite. Migrate a production agent this afternoon and keep your existing tests, retries and middleware.

Get an API key Read the BYOA pillar

Why teams swap base_url

Calling api.openai.com is fine for many workloads. It stops being fine the moment a prompt carries privileged client documents, regulated personal data, or work product covered by an NDA. Three forces typically push engineering teams to a sovereign endpoint.

Schrems II + FISA 702 exposure

EU controllers cannot cleanly justify routing privileged or personal data through US-controlled inference. VoltageGPU is operated by VOLTAGE EI in France — RGPD Art. 28, no transfer mechanism needed.

Training-data leakage risk

Even with retention disabled, prompts traverse provider memory in cleartext. Intel TDX seals decryption inside the enclave — the hypervisor cannot read your prompts, and CPU-fused keys protect RAM.

AI Act Article 12 audit logging

Per-request attestation reports are ECDSA-signed by the TDX module. Pair them with completions for an audit trail that maps cleanly to Article 12 logging obligations.

Three-step migration

Start from a working OpenAI integration. Three changes — no patched SDK, no custom client.

Install the official OpenAI SDK

pip install openai (>=1.40.0) for Python or npm install openai (^4.60.0) for Node. No fork from VoltageGPU — exact same package as on openai.com.

Set base_url and api_key

base_url = "https://app.voltagegpu.com/v1" and api_key = "vg-..." from app.voltagegpu.com/settings/api-keys. The SDK constructor accepts both, no monkeypatching.

Pick a -TEE model

Replace your model parameter (e.g. "gpt-4o") with a confidential one: Qwen3-32B-TEE, Qwen3-235B-A22B-Instruct-2507-TEE, DeepSeek-V3-TEE, DeepSeek-R1-0528-TEE, or Llama-3.3-70B-Instruct-TEE.

Shell · Python install

BASH

# Standard OpenAI Python SDK — no fork, no patch
pip install openai>=1.40.0

Shell · Node install

BASH

# Standard OpenAI Node / TypeScript SDK
npm install openai@^4.60.0
# or: pnpm add openai@^4.60.0
# or: yarn add openai@^4.60.0

Shell · environment variables

ENV

# .env — same names that the OpenAI SDK auto-detects work
OPENAI_BASE_URL=https://app.voltagegpu.com/v1
OPENAI_API_KEY=vg-...

# If you are migrating an app that already uses OPENAI_API_KEY for openai.com,
# keep both keys distinct; many SDK helpers (LangChain, LlamaIndex) read these.

Python — full working examples

The four examples below cover what most production workloads actually use: chat, streaming, tool calling, structured outputs, and the async client.

Python · Chat completions

PYTHON

# Drop-in: same OpenAI Python client, sovereign endpoint
from openai import OpenAI

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",  # https://app.voltagegpu.com/settings/api-keys
)

response = client.chat.completions.create(
    model="Qwen3-32B-TEE",
    messages=[
        {"role": "system", "content": "You are a sovereign AI assistant."},
        {"role": "user",   "content": "Summarize this MSA section..."},
    ],
    temperature=0.2,
    max_tokens=1024,
)

print(response.choices[0].message.content)

Python · Streaming

PYTHON

# Streaming — identical iterator, sealed in TDX
from openai import OpenAI

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

stream = client.chat.completions.create(
    model="DeepSeek-V3-TEE",
    messages=[{"role": "user", "content": "Draft an Article 28 DPA clause..."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Python · Tool / function calling

PYTHON

# Tool / function calling — every -TEE model with tools support
from openai import OpenAI
import json

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_contract",
            "description": "Fetch a contract by id from the in-house DMS.",
            "parameters": {
                "type": "object",
                "properties": {
                    "contract_id": {"type": "string"},
                },
                "required": ["contract_id"],
            },
        },
    },
]

resp = client.chat.completions.create(
    model="Qwen3-235B-A22B-Instruct-2507-TEE",
    messages=[{"role": "user", "content": "Pull contract MSA-2026-0142 and find auto-renewal."}],
    tools=tools,
    tool_choice="auto",
)

tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print("Model wants to call:", tool_call.function.name, "with", args)

Python · Structured outputs (Pydantic)

PYTHON

# Structured outputs — JSON mode + response_format
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

class Finding(BaseModel):
    clause: str
    severity: str       # "low" | "medium" | "high"
    rationale: str

class Review(BaseModel):
    findings: list[Finding]

resp = client.chat.completions.create(
    model="DeepSeek-R1-0528-TEE",
    messages=[
        {"role": "system", "content": "Return strict JSON conforming to the schema."},
        {"role": "user",   "content": "Review this NDA for Article 28 RGPD gaps..."},
    ],
    response_format={"type": "json_object"},
)

review = Review.model_validate_json(resp.choices[0].message.content)
for f in review.findings:
    print(f.severity, "-", f.clause)

Python · Async client

PYTHON

# Async client — same import path, full asyncio support
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

async def summarize(text: str) -> str:
    resp = await client.chat.completions.create(
        model="Qwen3-32B-TEE",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return resp.choices[0].message.content

async def main():
    docs = ["...doc1...", "...doc2...", "...doc3..."]
    summaries = await asyncio.gather(*(summarize(d) for d in docs))
    for s in summaries:
        print(s)

asyncio.run(main())

Node / TypeScript — full working examples

The official openai package on npm. Works in Node, Edge runtimes (Vercel, Cloudflare Workers) and Bun.

TypeScript · Chat completions

TYPESCRIPT

// Drop-in: same OpenAI Node SDK, sovereign endpoint
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!, // vg-...
});

const response = await client.chat.completions.create({
  model: "Qwen3-32B-TEE",
  messages: [
    { role: "system", content: "You are a sovereign AI assistant." },
    { role: "user",   content: "Summarize this MSA section..." },
  ],
  temperature: 0.2,
});

console.log(response.choices[0].message.content);

TypeScript · Streaming

TYPESCRIPT

// Streaming with the official Node SDK iterator
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!,
});

const stream = await client.chat.completions.create({
  model: "DeepSeek-V3-TEE",
  messages: [{ role: "user", content: "Draft an Article 28 clause..." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

TypeScript · Tool calling

TYPESCRIPT

// Tool / function calling with the Node SDK
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!,
});

const resp = await client.chat.completions.create({
  model: "Qwen3-235B-A22B-Instruct-2507-TEE",
  messages: [
    { role: "user", content: "Pull contract MSA-2026-0142 and find auto-renewal." },
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "lookup_contract",
        description: "Fetch a contract by id from the in-house DMS.",
        parameters: {
          type: "object",
          properties: { contract_id: { type: "string" } },
          required: ["contract_id"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

const call = resp.choices[0].message.tool_calls?.[0];
console.log("Tool call:", call?.function.name, call?.function.arguments);

OpenAI feature compatibility matrix

Everything that lives under /v1/chat/completions and /v1/embeddings is supported on -TEE models. Higher-level surfaces (Assistants, Batch, fine-tuning) are on the roadmap or live elsewhere.

Chat completions

Every -TEE model. Identical request/response shape.

Supported

Streaming (SSE)

Standard for-await iterator on Python and Node SDKs.

Supported

Tool / function calling

Qwen3-32B-TEE, Qwen3-235B-TEE, DeepSeek-V3/R1-TEE.

Supported

Structured outputs (JSON mode)

response_format={type:"json_object"} works on every -TEE model.

Supported

Vision (image input)

Multimodal -TEE models only (Qwen3-VL-TEE, roadmap).

Partial

Embeddings

text-embedding-3-large-TEE, dimension 1536.

Supported

Fine-tuning

Confidential fine-tuning lives at /fine-tuning-tdx.

Roadmap

Batch API

Async batch endpoint — Q3 2026.

Roadmap

Assistants API

Threads + runs are not yet on the confidential endpoint.

Roadmap

Models on the OpenAI-compatible endpoint

Only models with the -TEE suffix are exposed on the confidential endpoint — every other model name is rejected with a clear 400. The model name is what you pass to the model parameter on the OpenAI client.

Qwen3-32B-TEEBALANCED128K context

Default worker model. Multilingual, fast, low-cost agent loops.

in $0.50 / 1Mout $1.50 / 1M

Qwen3-235B-A22B-Instruct-2507-TEEFLAGSHIP262K context

Long-context drafting, contract review, RAG over large corpora.

in $1.20 / 1Mout $3.50 / 1M

DeepSeek-V3-TEEGENERAL128K context

Strong general-purpose, code-heavy and English-leaning workloads.

in $0.90 / 1Mout $2.40 / 1M

DeepSeek-R1-0528-TEEREASONING128K context

Reasoning chains, IC memos, deep analysis with chain-of-thought.

in $1.80 / 1Mout $5.40 / 1M

Llama-3.3-70B-Instruct-TEEOPEN128K context

English-heavy production tasks, drop-in for Llama-based pipelines.

in $0.80 / 1Mout $2.40 / 1M

text-embedding-3-large-TEEEMBEDDINGS8K input

1536-dim sovereign embeddings for RAG. Same shape as OpenAI embed-3.

in $0.13 / 1Mout —

Volume contracts above 100M tokens / month — talk to sales.

Latency, regions and edge runtimes

TTFT

Typical time-to-first-token: 350-650 ms on Qwen3-32B-TEE / Llama-3.3-70B-TEE from EU callers. Long-context flagships add 150-250 ms.

EU egress by default

Inference clusters in France and Germany. US-region pinning is available on Enterprise. No data leaves the trust domain in cleartext.

Edge runtime support

The OpenAI Node SDK works on Vercel Edge, Cloudflare Workers, Bun and Deno. Streaming uses standard SSE — no special transport.

Authentication and rate limits

API keys are project-scoped, prefixed vg-

Issue and rotate at app.voltagegpu.com/settings/api-keys. Keys can be limited to a model class, a max RPM, or a token budget. Revocation is immediate.

Default limits

Plus: 600 RPM / 1.2M TPM. Starter: 1,200 RPM / 3M TPM. Pro: 2,400 RPM / 6M TPM. Enterprise: bespoke. 429 responses include Retry-After — the OpenAI SDK retry helpers handle it natively.

TLS terminates inside the trust domain

TLS unwrap happens inside the TDX enclave, not on a generic proxy. The hypervisor and infrastructure operators never see prompts in cleartext.

Per-request attestation (opt-in)

Set the X-VGPU-Attest: required header. Each completion is paired with an ECDSA-signed report identifying the TDX module, base model digest and runtime version.

Pricing

Plans bundle included tokens, RPM/TPM, and platform features. Per-token rates match the standard inference catalog above — there is no surcharge for routing through the confidential endpoint.

PLUS

$20 / mo

Individual developer — first production calls.

$5 included tokens
600 RPM / 1.2M TPM
All -TEE chat models
Email support

POPULAR

STARTER

$349 / mo

Small engineering team shipping their first agent.

$80 included tokens
1,200 RPM / 3M TPM
Streaming + tools + JSON mode
Slack support

PRO

$1,199 / mo

Production agent fleets up to 10 seats.

$280 included tokens
2,400 RPM / 6M TPM
Per-request attestation
Audit log export

ENTERPRISE

from $5,000 / mo

Regulated industries, dedicated capacity, custom DPA.

SSO / SAML / SCIM
Bespoke RPM/TPM
Dedicated TDX capacity
Named CSM + 99.9% SLA

What confidential actually means here

Prompts decrypt only inside Intel TDX

TLS unwrap happens inside the trust domain. Operators, the hypervisor, and adjacent tenants cannot read prompts at any point.

AES-256 memory encryption

CPU-fused keys protect runtime RAM. A live memory dump from the host yields ciphertext, not your contracts or PHI.

ECDSA-signed attestation per request

Each completion can be paired with an attestation report identifying the TDX module, base model digest, runtime version. Verifiable against Intel root keys.

Zero retention, zero training reuse

Prompts and completions are not persisted. No prompt is reused for training. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France).

Frequently asked questions

Do I really not have to change my code?

You change three lines: the base_url, the api_key, and the model name (any -TEE model). client.chat.completions.create(), client.embeddings.create(), the streaming iterator, the async client, tool calling, response_format — all keep working because the wire format is OpenAI-compatible.

Is the OpenAI SDK actually unmodified?

Yes. We test against pip install openai (>=1.40.0) and npm install openai (^4.60.0) — the official packages from openai/openai-python and openai/openai-node. There is no VoltageGPU fork. If a future SDK version breaks compatibility, we treat it as a P1 bug on our side.

Why is the endpoint at app.voltagegpu.com/v1 and not api.voltagegpu.com?

app.voltagegpu.com hosts the developer console, settings, and the OpenAI-compatible inference path under /v1. Both hostnames terminate inside the same TDX trust domain — this is purely a routing convenience for self-serve developers.

How do I keep using the OpenAI SDK against api.openai.com for some calls?

Instantiate two clients. One with base_url defaulted (api.openai.com) for non-sensitive workloads, and one with base_url=https://app.voltagegpu.com/v1 for confidential workloads. Pick the client at the call site, or wrap them behind a small router by data-classification.

Are my prompts logged?

No. Zero retention by default. Prompts decrypt only inside the TDX enclave, are processed for your completion, and are not persisted to disk. Aggregate metering counters (token counts, model name) are kept for billing but are not the prompt content. Pro and Enterprise can opt into encrypted attestation-bundled audit logs for AI Act Article 12.

What about latency from the US?

EU egress is the default region. From US callers, expect ~120-180 ms of cross-Atlantic transit on top of TTFT. Enterprise contracts can pin a US TDX region — talk to sales.

Does it work with LangChain, LlamaIndex, CrewAI, AutoGen, Vercel AI SDK?

Yes — every framework that lets you set base_url and api_key on its OpenAI provider works. See /langchain-tee-deployment, /crewai-private-deployment, and /mcp-server-confidential for full setups. Vercel AI SDK works via createOpenAI({ baseURL, apiKey }).

Can I prove a request actually ran inside Intel TDX?

Yes. Per-request attestation can be enabled per project. Each completion is paired with an ECDSA-signed report identifying the TDX module, the trust-domain measurement, and the base model digest. Reports are verifiable against Intel root keys.

EXPLORE FURTHER

Bring Your Own Agent

Parent pillar

LangChain TEE deployment

Same SDK, LangChain layer

CrewAI private deployment

Multi-agent crews on TDX

MCP server confidential

Tool calls in TDX

Sovereign agentic AI

Architectural overview

Private ChatGPT

Hosted UI on the same endpoint

Swap base_url, ship before lunch

Generate a vg- API key, change one line, keep your existing OpenAI SDK code.

Create account

OpenAI SDK on a confidential, EU-controlled endpoint

Drop-in OpenAI SDK migration to a TDX-sealed, sovereign inference endpoint

Why developers swap base_url

Three-line migration

Compatible features matrix

Models on the confidential endpoint

Pricing

Related developer resources

OpenAI SDK on a confidential,
EU-controlled endpoint.

Why teams swap base_url

Three-step migration

Python — full working examples

Node / TypeScript — full working examples

OpenAI feature compatibility matrix

Models on the OpenAI-compatible endpoint

Latency, regions and edge runtimes

Authentication and rate limits

Pricing

What confidential actually means here

Frequently asked questions

OpenAI SDK on a confidential,EU-controlled endpoint.

Why teams swap base_url

Three-step migration

Python — full working examples

Node / TypeScript — full working examples

OpenAI feature compatibility matrix

Models on the OpenAI-compatible endpoint

Latency, regions and edge runtimes

Authentication and rate limits

Pricing

What confidential actually means here

Frequently asked questions

OpenAI SDK on a confidential,
EU-controlled endpoint.