OpenAI SDK · Private Endpoint
Sealed in Intel TDX
OPENAI-COMPATIBLE · DROP-IN · CONFIDENTIAL

OpenAI SDK on a confidential,
EU-controlled endpoint.

Same from openai import OpenAI. Same client.chat.completions.create(). One line changes — base_url now points at https://app.voltagegpu.com/v1, and every token is sealed inside Intel TDX hardware enclaves we operate in the EU.

No fork. No proprietary client. No code rewrite. Migrate a production agent this afternoon and keep your existing tests, retries and middleware.

Get an API keyRead the BYOA pillar

Why teams swap base_url

Calling api.openai.com is fine for many workloads. It stops being fine the moment a prompt carries privileged client documents, regulated personal data, or work product covered by an NDA. Three forces typically push engineering teams to a sovereign endpoint.

Schrems II + FISA 702 exposure

EU controllers cannot cleanly justify routing privileged or personal data through US-controlled inference. VoltageGPU is operated by VOLTAGE EI in France — RGPD Art. 28, no transfer mechanism needed.

Training-data leakage risk

Even with retention disabled, prompts traverse provider memory in cleartext. Intel TDX seals decryption inside the enclave — the hypervisor cannot read your prompts, and CPU-fused keys protect RAM.

AI Act Article 12 audit logging

Per-request attestation reports are ECDSA-signed by the TDX module. Pair them with completions for an audit trail that maps cleanly to Article 12 logging obligations.

Three-step migration

Start from a working OpenAI integration. Three changes — no patched SDK, no custom client.

1

Install the official OpenAI SDK

pip install openai (>=1.40.0) for Python or npm install openai (^4.60.0) for Node. No fork from VoltageGPU — exact same package as on openai.com.

2

Set base_url and api_key

base_url = "https://app.voltagegpu.com/v1" and api_key = "vg-..." from app.voltagegpu.com/settings/api-keys. The SDK constructor accepts both, no monkeypatching.

3

Pick a -TEE model

Replace your model parameter (e.g. "gpt-4o") with a confidential one: Qwen3-32B-TEE, Qwen3-235B-A22B-Instruct-2507-TEE, DeepSeek-V3-TEE, DeepSeek-R1-0528-TEE, or Llama-3.3-70B-Instruct-TEE.

Shell · Python install
BASH
# Standard OpenAI Python SDK — no fork, no patch
pip install openai>=1.40.0
Shell · Node install
BASH
# Standard OpenAI Node / TypeScript SDK
npm install openai@^4.60.0
# or: pnpm add openai@^4.60.0
# or: yarn add openai@^4.60.0
Shell · environment variables
ENV
# .env — same names that the OpenAI SDK auto-detects work
OPENAI_BASE_URL=https://app.voltagegpu.com/v1
OPENAI_API_KEY=vg-...

# If you are migrating an app that already uses OPENAI_API_KEY for openai.com,
# keep both keys distinct; many SDK helpers (LangChain, LlamaIndex) read these.

Python — full working examples

The four examples below cover what most production workloads actually use: chat, streaming, tool calling, structured outputs, and the async client.

Python · Chat completions
PYTHON
# Drop-in: same OpenAI Python client, sovereign endpoint
from openai import OpenAI

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",  # https://app.voltagegpu.com/settings/api-keys
)

response = client.chat.completions.create(
    model="Qwen3-32B-TEE",
    messages=[
        {"role": "system", "content": "You are a sovereign AI assistant."},
        {"role": "user",   "content": "Summarize this MSA section..."},
    ],
    temperature=0.2,
    max_tokens=1024,
)

print(response.choices[0].message.content)
Python · Streaming
PYTHON
# Streaming — identical iterator, sealed in TDX
from openai import OpenAI

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

stream = client.chat.completions.create(
    model="DeepSeek-V3-TEE",
    messages=[{"role": "user", "content": "Draft an Article 28 DPA clause..."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
Python · Tool / function calling
PYTHON
# Tool / function calling — every -TEE model with tools support
from openai import OpenAI
import json

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_contract",
            "description": "Fetch a contract by id from the in-house DMS.",
            "parameters": {
                "type": "object",
                "properties": {
                    "contract_id": {"type": "string"},
                },
                "required": ["contract_id"],
            },
        },
    },
]

resp = client.chat.completions.create(
    model="Qwen3-235B-A22B-Instruct-2507-TEE",
    messages=[{"role": "user", "content": "Pull contract MSA-2026-0142 and find auto-renewal."}],
    tools=tools,
    tool_choice="auto",
)

tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print("Model wants to call:", tool_call.function.name, "with", args)
Python · Structured outputs (Pydantic)
PYTHON
# Structured outputs — JSON mode + response_format
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

class Finding(BaseModel):
    clause: str
    severity: str       # "low" | "medium" | "high"
    rationale: str

class Review(BaseModel):
    findings: list[Finding]

resp = client.chat.completions.create(
    model="DeepSeek-R1-0528-TEE",
    messages=[
        {"role": "system", "content": "Return strict JSON conforming to the schema."},
        {"role": "user",   "content": "Review this NDA for Article 28 RGPD gaps..."},
    ],
    response_format={"type": "json_object"},
)

review = Review.model_validate_json(resp.choices[0].message.content)
for f in review.findings:
    print(f.severity, "-", f.clause)
Python · Async client
PYTHON
# Async client — same import path, full asyncio support
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

async def summarize(text: str) -> str:
    resp = await client.chat.completions.create(
        model="Qwen3-32B-TEE",
        messages=[{"role": "user", "content": f"Summarize: {text}"}],
    )
    return resp.choices[0].message.content

async def main():
    docs = ["...doc1...", "...doc2...", "...doc3..."]
    summaries = await asyncio.gather(*(summarize(d) for d in docs))
    for s in summaries:
        print(s)

asyncio.run(main())

Node / TypeScript — full working examples

The official openai package on npm. Works in Node, Edge runtimes (Vercel, Cloudflare Workers) and Bun.

TypeScript · Chat completions
TYPESCRIPT
// Drop-in: same OpenAI Node SDK, sovereign endpoint
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!, // vg-...
});

const response = await client.chat.completions.create({
  model: "Qwen3-32B-TEE",
  messages: [
    { role: "system", content: "You are a sovereign AI assistant." },
    { role: "user",   content: "Summarize this MSA section..." },
  ],
  temperature: 0.2,
});

console.log(response.choices[0].message.content);
TypeScript · Streaming
TYPESCRIPT
// Streaming with the official Node SDK iterator
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!,
});

const stream = await client.chat.completions.create({
  model: "DeepSeek-V3-TEE",
  messages: [{ role: "user", content: "Draft an Article 28 clause..." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
TypeScript · Tool calling
TYPESCRIPT
// Tool / function calling with the Node SDK
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://app.voltagegpu.com/v1",
  apiKey:  process.env.VOLTAGEGPU_API_KEY!,
});

const resp = await client.chat.completions.create({
  model: "Qwen3-235B-A22B-Instruct-2507-TEE",
  messages: [
    { role: "user", content: "Pull contract MSA-2026-0142 and find auto-renewal." },
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "lookup_contract",
        description: "Fetch a contract by id from the in-house DMS.",
        parameters: {
          type: "object",
          properties: { contract_id: { type: "string" } },
          required: ["contract_id"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

const call = resp.choices[0].message.tool_calls?.[0];
console.log("Tool call:", call?.function.name, call?.function.arguments);

OpenAI feature compatibility matrix

Everything that lives under /v1/chat/completions and /v1/embeddings is supported on -TEE models. Higher-level surfaces (Assistants, Batch, fine-tuning) are on the roadmap or live elsewhere.

Chat completions

Every -TEE model. Identical request/response shape.

Supported

Streaming (SSE)

Standard for-await iterator on Python and Node SDKs.

Supported

Tool / function calling

Qwen3-32B-TEE, Qwen3-235B-TEE, DeepSeek-V3/R1-TEE.

Supported

Structured outputs (JSON mode)

response_format={type:"json_object"} works on every -TEE model.

Supported

Vision (image input)

Multimodal -TEE models only (Qwen3-VL-TEE, roadmap).

Partial

Embeddings

text-embedding-3-large-TEE, dimension 1536.

Supported

Fine-tuning

Confidential fine-tuning lives at /fine-tuning-tdx.

Roadmap

Batch API

Async batch endpoint — Q3 2026.

Roadmap

Assistants API

Threads + runs are not yet on the confidential endpoint.

Roadmap

Models on the OpenAI-compatible endpoint

Only models with the -TEE suffix are exposed on the confidential endpoint — every other model name is rejected with a clear 400. The model name is what you pass to the model parameter on the OpenAI client.

Qwen3-32B-TEEBALANCED128K context

Default worker model. Multilingual, fast, low-cost agent loops.

in $0.50 / 1Mout $1.50 / 1M
Qwen3-235B-A22B-Instruct-2507-TEEFLAGSHIP262K context

Long-context drafting, contract review, RAG over large corpora.

in $1.20 / 1Mout $3.50 / 1M
DeepSeek-V3-TEEGENERAL128K context

Strong general-purpose, code-heavy and English-leaning workloads.

in $0.90 / 1Mout $2.40 / 1M
DeepSeek-R1-0528-TEEREASONING128K context

Reasoning chains, IC memos, deep analysis with chain-of-thought.

in $1.80 / 1Mout $5.40 / 1M
Llama-3.3-70B-Instruct-TEEOPEN128K context

English-heavy production tasks, drop-in for Llama-based pipelines.

in $0.80 / 1Mout $2.40 / 1M
text-embedding-3-large-TEEEMBEDDINGS8K input

1536-dim sovereign embeddings for RAG. Same shape as OpenAI embed-3.

in $0.13 / 1Mout

Volume contracts above 100M tokens / month — talk to sales.

Latency, regions and edge runtimes

TTFT

Typical time-to-first-token: 350-650 ms on Qwen3-32B-TEE / Llama-3.3-70B-TEE from EU callers. Long-context flagships add 150-250 ms.

EU egress by default

Inference clusters in France and Germany. US-region pinning is available on Enterprise. No data leaves the trust domain in cleartext.

Edge runtime support

The OpenAI Node SDK works on Vercel Edge, Cloudflare Workers, Bun and Deno. Streaming uses standard SSE — no special transport.

Authentication and rate limits

API keys are project-scoped, prefixed vg-

Issue and rotate at app.voltagegpu.com/settings/api-keys. Keys can be limited to a model class, a max RPM, or a token budget. Revocation is immediate.

Default limits

Plus: 600 RPM / 1.2M TPM. Starter: 1,200 RPM / 3M TPM. Pro: 2,400 RPM / 6M TPM. Enterprise: bespoke. 429 responses include Retry-After — the OpenAI SDK retry helpers handle it natively.

TLS terminates inside the trust domain

TLS unwrap happens inside the TDX enclave, not on a generic proxy. The hypervisor and infrastructure operators never see prompts in cleartext.

Per-request attestation (opt-in)

Set the X-VGPU-Attest: required header. Each completion is paired with an ECDSA-signed report identifying the TDX module, base model digest and runtime version.

Pricing

Plans bundle included tokens, RPM/TPM, and platform features. Per-token rates match the standard inference catalog above — there is no surcharge for routing through the confidential endpoint.

PLUS

$20 / mo

Individual developer — first production calls.

  • $5 included tokens
  • 600 RPM / 1.2M TPM
  • All -TEE chat models
  • Email support
POPULAR

STARTER

$349 / mo

Small engineering team shipping their first agent.

  • $80 included tokens
  • 1,200 RPM / 3M TPM
  • Streaming + tools + JSON mode
  • Slack support

PRO

$1,199 / mo

Production agent fleets up to 10 seats.

  • $280 included tokens
  • 2,400 RPM / 6M TPM
  • Per-request attestation
  • Audit log export

ENTERPRISE

from $5,000 / mo

Regulated industries, dedicated capacity, custom DPA.

  • SSO / SAML / SCIM
  • Bespoke RPM/TPM
  • Dedicated TDX capacity
  • Named CSM + 99.9% SLA

What confidential actually means here

Prompts decrypt only inside Intel TDX

TLS unwrap happens inside the trust domain. Operators, the hypervisor, and adjacent tenants cannot read prompts at any point.

AES-256 memory encryption

CPU-fused keys protect runtime RAM. A live memory dump from the host yields ciphertext, not your contracts or PHI.

ECDSA-signed attestation per request

Each completion can be paired with an attestation report identifying the TDX module, base model digest, runtime version. Verifiable against Intel root keys.

Zero retention, zero training reuse

Prompts and completions are not persisted. No prompt is reused for training. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France).

Frequently asked questions

Do I really not have to change my code?

You change three lines: the base_url, the api_key, and the model name (any -TEE model). client.chat.completions.create(), client.embeddings.create(), the streaming iterator, the async client, tool calling, response_format — all keep working because the wire format is OpenAI-compatible.

Is the OpenAI SDK actually unmodified?

Yes. We test against pip install openai (>=1.40.0) and npm install openai (^4.60.0) — the official packages from openai/openai-python and openai/openai-node. There is no VoltageGPU fork. If a future SDK version breaks compatibility, we treat it as a P1 bug on our side.

Why is the endpoint at app.voltagegpu.com/v1 and not api.voltagegpu.com?

app.voltagegpu.com hosts the developer console, settings, and the OpenAI-compatible inference path under /v1. Both hostnames terminate inside the same TDX trust domain — this is purely a routing convenience for self-serve developers.

How do I keep using the OpenAI SDK against api.openai.com for some calls?

Instantiate two clients. One with base_url defaulted (api.openai.com) for non-sensitive workloads, and one with base_url=https://app.voltagegpu.com/v1 for confidential workloads. Pick the client at the call site, or wrap them behind a small router by data-classification.

Are my prompts logged?

No. Zero retention by default. Prompts decrypt only inside the TDX enclave, are processed for your completion, and are not persisted to disk. Aggregate metering counters (token counts, model name) are kept for billing but are not the prompt content. Pro and Enterprise can opt into encrypted attestation-bundled audit logs for AI Act Article 12.

What about latency from the US?

EU egress is the default region. From US callers, expect ~120-180 ms of cross-Atlantic transit on top of TTFT. Enterprise contracts can pin a US TDX region — talk to sales.

Does it work with LangChain, LlamaIndex, CrewAI, AutoGen, Vercel AI SDK?

Yes — every framework that lets you set base_url and api_key on its OpenAI provider works. See /langchain-tee-deployment, /crewai-private-deployment, and /mcp-server-confidential for full setups. Vercel AI SDK works via createOpenAI({ baseURL, apiKey }).

Can I prove a request actually ran inside Intel TDX?

Yes. Per-request attestation can be enabled per project. Each completion is paired with an ECDSA-signed report identifying the TDX module, the trust-domain measurement, and the base model digest. Reports are verifiable against Intel root keys.

EXPLORE FURTHER

Bring Your Own Agent

Parent pillar

LangChain TEE deployment

Same SDK, LangChain layer

CrewAI private deployment

Multi-agent crews on TDX

MCP server confidential

Tool calls in TDX

Sovereign agentic AI

Architectural overview

Private ChatGPT

Hosted UI on the same endpoint

Swap base_url, ship before lunch

Generate a vg- API key, change one line, keep your existing OpenAI SDK code.

Create account