Use the official OpenAI Python and Node/TypeScript SDKs unchanged. Set base_url to https://app.voltagegpu.com/v1, pass a VoltageGPU API key (prefix vg-), and every client.chat.completions.create(), streaming iterator, embeddings call and tool-calling flow runs against confidential -TEE models hosted by VOLTAGE EI in France. No fork, no patched SDK, no proprietary client.
Sending production prompts to api.openai.com is fine for many workloads, but European teams handling privileged client documents, personal data under RGPD, financial transactions under DORA, or work product covered by lawyer-client privilege, typically cannot accept the FISA 702 / CLOUD Act exposure of US-controlled inference. VoltageGPU is operated by VOLTAGE EI (SIREN 943 808 824, Solaize, France), seals each completion in Intel TDX hardware enclaves, signs an attestation report per request, never retains prompts, and ships a native Article 28 DPA.
Step one: pip install openai — the standard package, no fork. Step two: instantiate the client with base_url="https://app.voltagegpu.com/v1" and api_key from VoltageGPU. Step three: change the model parameter to a -TEE name such as Qwen3-32B-TEE, Qwen3-235B-A22B-Instruct-2507-TEE, DeepSeek-V3-TEE, DeepSeek-R1-0528-TEE or Llama-3.3-70B-Instruct-TEE. Streaming, tool calling and JSON mode keep working.
Plus $20/month for individual developers, Starter $349/month for small teams, Pro $1,199/month for production agent fleets, Enterprise from $5,000/month with SSO, SCIM, dedicated capacity, custom DPA. Per-token rates match the standard inference catalog — there is no surcharge for routing through the confidential endpoint.
Same from openai import OpenAI. Same client.chat.completions.create(). One line changes — base_url now points at https://app.voltagegpu.com/v1, and every token is sealed inside Intel TDX hardware enclaves we operate in the EU.
No fork. No proprietary client. No code rewrite. Migrate a production agent this afternoon and keep your existing tests, retries and middleware.
Calling api.openai.com is fine for many workloads. It stops being fine the moment a prompt carries privileged client documents, regulated personal data, or work product covered by an NDA. Three forces typically push engineering teams to a sovereign endpoint.
Schrems II + FISA 702 exposure
EU controllers cannot cleanly justify routing privileged or personal data through US-controlled inference. VoltageGPU is operated by VOLTAGE EI in France — RGPD Art. 28, no transfer mechanism needed.
Training-data leakage risk
Even with retention disabled, prompts traverse provider memory in cleartext. Intel TDX seals decryption inside the enclave — the hypervisor cannot read your prompts, and CPU-fused keys protect RAM.
AI Act Article 12 audit logging
Per-request attestation reports are ECDSA-signed by the TDX module. Pair them with completions for an audit trail that maps cleanly to Article 12 logging obligations.
Start from a working OpenAI integration. Three changes — no patched SDK, no custom client.
Install the official OpenAI SDK
pip install openai (>=1.40.0) for Python or npm install openai (^4.60.0) for Node. No fork from VoltageGPU — exact same package as on openai.com.
Set base_url and api_key
base_url = "https://app.voltagegpu.com/v1" and api_key = "vg-..." from app.voltagegpu.com/settings/api-keys. The SDK constructor accepts both, no monkeypatching.
Pick a -TEE model
Replace your model parameter (e.g. "gpt-4o") with a confidential one: Qwen3-32B-TEE, Qwen3-235B-A22B-Instruct-2507-TEE, DeepSeek-V3-TEE, DeepSeek-R1-0528-TEE, or Llama-3.3-70B-Instruct-TEE.
# Standard OpenAI Python SDK — no fork, no patch
pip install openai>=1.40.0# Standard OpenAI Node / TypeScript SDK
npm install openai@^4.60.0
# or: pnpm add openai@^4.60.0
# or: yarn add openai@^4.60.0# .env — same names that the OpenAI SDK auto-detects work
OPENAI_BASE_URL=https://app.voltagegpu.com/v1
OPENAI_API_KEY=vg-...
# If you are migrating an app that already uses OPENAI_API_KEY for openai.com,
# keep both keys distinct; many SDK helpers (LangChain, LlamaIndex) read these.The four examples below cover what most production workloads actually use: chat, streaming, tool calling, structured outputs, and the async client.
# Drop-in: same OpenAI Python client, sovereign endpoint
from openai import OpenAI
client = OpenAI(
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...", # https://app.voltagegpu.com/settings/api-keys
)
response = client.chat.completions.create(
model="Qwen3-32B-TEE",
messages=[
{"role": "system", "content": "You are a sovereign AI assistant."},
{"role": "user", "content": "Summarize this MSA section..."},
],
temperature=0.2,
max_tokens=1024,
)
print(response.choices[0].message.content)# Streaming — identical iterator, sealed in TDX
from openai import OpenAI
client = OpenAI(
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
)
stream = client.chat.completions.create(
model="DeepSeek-V3-TEE",
messages=[{"role": "user", "content": "Draft an Article 28 DPA clause..."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)# Tool / function calling — every -TEE model with tools support
from openai import OpenAI
import json
client = OpenAI(
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
)
tools = [
{
"type": "function",
"function": {
"name": "lookup_contract",
"description": "Fetch a contract by id from the in-house DMS.",
"parameters": {
"type": "object",
"properties": {
"contract_id": {"type": "string"},
},
"required": ["contract_id"],
},
},
},
]
resp = client.chat.completions.create(
model="Qwen3-235B-A22B-Instruct-2507-TEE",
messages=[{"role": "user", "content": "Pull contract MSA-2026-0142 and find auto-renewal."}],
tools=tools,
tool_choice="auto",
)
tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print("Model wants to call:", tool_call.function.name, "with", args)# Structured outputs — JSON mode + response_format
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI(
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
)
class Finding(BaseModel):
clause: str
severity: str # "low" | "medium" | "high"
rationale: str
class Review(BaseModel):
findings: list[Finding]
resp = client.chat.completions.create(
model="DeepSeek-R1-0528-TEE",
messages=[
{"role": "system", "content": "Return strict JSON conforming to the schema."},
{"role": "user", "content": "Review this NDA for Article 28 RGPD gaps..."},
],
response_format={"type": "json_object"},
)
review = Review.model_validate_json(resp.choices[0].message.content)
for f in review.findings:
print(f.severity, "-", f.clause)# Async client — same import path, full asyncio support
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
)
async def summarize(text: str) -> str:
resp = await client.chat.completions.create(
model="Qwen3-32B-TEE",
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return resp.choices[0].message.content
async def main():
docs = ["...doc1...", "...doc2...", "...doc3..."]
summaries = await asyncio.gather(*(summarize(d) for d in docs))
for s in summaries:
print(s)
asyncio.run(main())The official openai package on npm. Works in Node, Edge runtimes (Vercel, Cloudflare Workers) and Bun.
// Drop-in: same OpenAI Node SDK, sovereign endpoint
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://app.voltagegpu.com/v1",
apiKey: process.env.VOLTAGEGPU_API_KEY!, // vg-...
});
const response = await client.chat.completions.create({
model: "Qwen3-32B-TEE",
messages: [
{ role: "system", content: "You are a sovereign AI assistant." },
{ role: "user", content: "Summarize this MSA section..." },
],
temperature: 0.2,
});
console.log(response.choices[0].message.content);// Streaming with the official Node SDK iterator
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://app.voltagegpu.com/v1",
apiKey: process.env.VOLTAGEGPU_API_KEY!,
});
const stream = await client.chat.completions.create({
model: "DeepSeek-V3-TEE",
messages: [{ role: "user", content: "Draft an Article 28 clause..." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}// Tool / function calling with the Node SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://app.voltagegpu.com/v1",
apiKey: process.env.VOLTAGEGPU_API_KEY!,
});
const resp = await client.chat.completions.create({
model: "Qwen3-235B-A22B-Instruct-2507-TEE",
messages: [
{ role: "user", content: "Pull contract MSA-2026-0142 and find auto-renewal." },
],
tools: [
{
type: "function",
function: {
name: "lookup_contract",
description: "Fetch a contract by id from the in-house DMS.",
parameters: {
type: "object",
properties: { contract_id: { type: "string" } },
required: ["contract_id"],
},
},
},
],
tool_choice: "auto",
});
const call = resp.choices[0].message.tool_calls?.[0];
console.log("Tool call:", call?.function.name, call?.function.arguments);Everything that lives under /v1/chat/completions and /v1/embeddings is supported on -TEE models. Higher-level surfaces (Assistants, Batch, fine-tuning) are on the roadmap or live elsewhere.
Chat completions
Every -TEE model. Identical request/response shape.
Streaming (SSE)
Standard for-await iterator on Python and Node SDKs.
Tool / function calling
Qwen3-32B-TEE, Qwen3-235B-TEE, DeepSeek-V3/R1-TEE.
Structured outputs (JSON mode)
response_format={type:"json_object"} works on every -TEE model.
Vision (image input)
Multimodal -TEE models only (Qwen3-VL-TEE, roadmap).
Embeddings
text-embedding-3-large-TEE, dimension 1536.
Fine-tuning
Confidential fine-tuning lives at /fine-tuning-tdx.
Batch API
Async batch endpoint — Q3 2026.
Assistants API
Threads + runs are not yet on the confidential endpoint.
Only models with the -TEE suffix are exposed on the confidential endpoint — every other model name is rejected with a clear 400. The model name is what you pass to the model parameter on the OpenAI client.
Default worker model. Multilingual, fast, low-cost agent loops.
Long-context drafting, contract review, RAG over large corpora.
Strong general-purpose, code-heavy and English-leaning workloads.
Reasoning chains, IC memos, deep analysis with chain-of-thought.
English-heavy production tasks, drop-in for Llama-based pipelines.
1536-dim sovereign embeddings for RAG. Same shape as OpenAI embed-3.
Volume contracts above 100M tokens / month — talk to sales.
TTFT
Typical time-to-first-token: 350-650 ms on Qwen3-32B-TEE / Llama-3.3-70B-TEE from EU callers. Long-context flagships add 150-250 ms.
EU egress by default
Inference clusters in France and Germany. US-region pinning is available on Enterprise. No data leaves the trust domain in cleartext.
Edge runtime support
The OpenAI Node SDK works on Vercel Edge, Cloudflare Workers, Bun and Deno. Streaming uses standard SSE — no special transport.
API keys are project-scoped, prefixed vg-
Issue and rotate at app.voltagegpu.com/settings/api-keys. Keys can be limited to a model class, a max RPM, or a token budget. Revocation is immediate.
Default limits
Plus: 600 RPM / 1.2M TPM. Starter: 1,200 RPM / 3M TPM. Pro: 2,400 RPM / 6M TPM. Enterprise: bespoke. 429 responses include Retry-After — the OpenAI SDK retry helpers handle it natively.
TLS terminates inside the trust domain
TLS unwrap happens inside the TDX enclave, not on a generic proxy. The hypervisor and infrastructure operators never see prompts in cleartext.
Per-request attestation (opt-in)
Set the X-VGPU-Attest: required header. Each completion is paired with an ECDSA-signed report identifying the TDX module, base model digest and runtime version.
Plans bundle included tokens, RPM/TPM, and platform features. Per-token rates match the standard inference catalog above — there is no surcharge for routing through the confidential endpoint.
PLUS
$20 / mo
Individual developer — first production calls.
STARTER
$349 / mo
Small engineering team shipping their first agent.
PRO
$1,199 / mo
Production agent fleets up to 10 seats.
ENTERPRISE
from $5,000 / mo
Regulated industries, dedicated capacity, custom DPA.
Prompts decrypt only inside Intel TDX
TLS unwrap happens inside the trust domain. Operators, the hypervisor, and adjacent tenants cannot read prompts at any point.
AES-256 memory encryption
CPU-fused keys protect runtime RAM. A live memory dump from the host yields ciphertext, not your contracts or PHI.
ECDSA-signed attestation per request
Each completion can be paired with an attestation report identifying the TDX module, base model digest, runtime version. Verifiable against Intel root keys.
Zero retention, zero training reuse
Prompts and completions are not persisted. No prompt is reused for training. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France).
You change three lines: the base_url, the api_key, and the model name (any -TEE model). client.chat.completions.create(), client.embeddings.create(), the streaming iterator, the async client, tool calling, response_format — all keep working because the wire format is OpenAI-compatible.
Yes. We test against pip install openai (>=1.40.0) and npm install openai (^4.60.0) — the official packages from openai/openai-python and openai/openai-node. There is no VoltageGPU fork. If a future SDK version breaks compatibility, we treat it as a P1 bug on our side.
app.voltagegpu.com hosts the developer console, settings, and the OpenAI-compatible inference path under /v1. Both hostnames terminate inside the same TDX trust domain — this is purely a routing convenience for self-serve developers.
Instantiate two clients. One with base_url defaulted (api.openai.com) for non-sensitive workloads, and one with base_url=https://app.voltagegpu.com/v1 for confidential workloads. Pick the client at the call site, or wrap them behind a small router by data-classification.
No. Zero retention by default. Prompts decrypt only inside the TDX enclave, are processed for your completion, and are not persisted to disk. Aggregate metering counters (token counts, model name) are kept for billing but are not the prompt content. Pro and Enterprise can opt into encrypted attestation-bundled audit logs for AI Act Article 12.
EU egress is the default region. From US callers, expect ~120-180 ms of cross-Atlantic transit on top of TTFT. Enterprise contracts can pin a US TDX region — talk to sales.
Yes — every framework that lets you set base_url and api_key on its OpenAI provider works. See /langchain-tee-deployment, /crewai-private-deployment, and /mcp-server-confidential for full setups. Vercel AI SDK works via createOpenAI({ baseURL, apiKey }).
Yes. Per-request attestation can be enabled per project. Each completion is paired with an ECDSA-signed report identifying the TDX module, the trust-domain measurement, and the base model digest. Reports are verifiable against Intel root keys.
EXPLORE FURTHER
Swap base_url, ship before lunch
Generate a vg- API key, change one line, keep your existing OpenAI SDK code.