What is VoltageGPU Confidential Chat?

A ChatGPT-style interface running inside Intel TDX hardware enclaves. Prompts and responses are encrypted in CPU memory at runtime and never leave the sealed Trust Domain. VoltageGPU cannot read user data, cannot log it, cannot train on it.

Do I need an account to try it?

No. Anonymous visitors get 3 free messages on Qwen3-32B-TEE without any signup, no credit card required. After the free trial, signup unlocks 50 free messages per month, Plus $20/mo unlocks 2000 messages with Qwen3-235B 262K context.

How is it different from ChatGPT Plus?

Same $20/mo price point but sealed in Intel TDX hardware. OpenAI trains on user data by default. VoltageGPU Plus cannot technically read user data — memory is encrypted with CPU-fused keys and even the hypervisor is outside the trust boundary. EU company (France, SIREN 943 808 824), native GDPR Article 28 DPA available.

Which TEE models are available?

3 models sealed in Intel TDX: Qwen3-32B-TEE (40K context, fast), Qwen3-235B-A22B-Instruct-2507-TEE (262K context, flagship, Plus/Pro), DeepSeek-R1-0528-TEE (163K context, reasoning model, Enterprise).

Can I use it for regulated work (legal, medical, finance)?

Yes. Beyond the generic chat, 8 specialized agents are available: Contract Analyst, Financial Analyst, Medical Records, Compliance Officer, Due Diligence, Cybersecurity, HR, Tax. GDPR Article 28, HIPAA, DORA, NIS2 ready. DPA available on request.

Yes — api.voltagegpu.com/v1 is OpenAI-compatible, pay-per-token starting at $0.15/M input tokens on Qwen3-32B-TEE. Drop-in replacement for the OpenAI SDK (change base_url).

Do I need to fork LangChain to use VoltageGPU?

No. LangChain ChatOpenAI accepts an OpenAI-compatible base_url. Construct it pointing at https://app.voltagegpu.com/v1 with a TEE model such as Qwen/Qwen3-32B-TEE, pass it to your agents, chains, and LangGraph nodes, and existing application code runs unchanged. No SDK fork, no monkey patching.

Does LangGraph work with VoltageGPU?

Yes. LangGraph state graphs use whatever LLM you bind to each node. Bind a ChatOpenAI instance pointed at app.voltagegpu.com/v1 to every reasoning node, and the entire graph (planner, retriever, generator, critic) runs against confidential inference sealed in Intel TDX. Conditional edges, checkpointing, human-in-the-loop, and async streaming all behave identically.

How does this satisfy EU AI Act Article 12 logging?

Article 12 of the AI Act requires automatic event logs throughout the lifetime of high-risk AI systems. VoltageGPU exposes structured per-request logs (timestamp, model, input hash, output hash, attestation report) that you can stream into your own logging pipeline. Combine that with LangChain LCEL callbacks to capture chain-level events, and you have a complete Article 12 audit trail without ever sending raw prompts to a US cloud.

Can I use streaming and tool calling?

Yes. The /v1/chat/completions endpoint supports OpenAI-compatible streaming (Server-Sent Events) and the function/tool calling schema. LangChain bind_tools, ChatOpenAI.stream, and astream all work with the confidential endpoint. Tool execution stays on your machine; only the reasoning step crosses the enclave boundary.

How do I run RAG over private documents without leaking embeddings?

Two options. Option A: keep your vector store on your own infrastructure (PGVector, Qdrant self-hosted, Chroma local) and only the matched chunks transit the confidential endpoint. Option B: use VoltageGPU embedding models (also -TEE suffix) so the embedding step itself runs inside Intel TDX. Both options keep raw document content out of any third-party SaaS vector database.

What is the latency overhead vs OpenAI direct?

Negligible. Intel TDX adds single-digit microseconds per memory access, which is dominated by token generation time. End-to-end TTFT and tokens/sec are within ~5% of bare-metal inference on the same model. The dominant variable is your geographic distance to our EU points of presence.

Does it work with LangChain JS / TypeScript?

Yes. The @langchain/openai package accepts the same base_url and apiKey configuration. Drop the URL into a ChatOpenAI constructor in Node.js or Deno and the rest of your chains, agents, and LangGraph.js workflows work without changes.

LangChain TEE Deployment

Sealed in Intel TDX

LANGCHAIN · LANGGRAPH · CONFIDENTIAL

Deploy LangChain agents inside
confidential VMs.

Drop-in ChatOpenAI swap. Every reasoning call routes through Intel TDX enclaves we operate in the EU. Existing LCEL chains, LangGraph state graphs, agents, retrievers, and tool calls run unchanged.

Built for legal RAG, financial analysis, healthcare summarization, and any pipeline that needs EU AI Act Article 12 logging without ever shipping privileged context to a US controller.

5 min

Quickstart

2 lines

To swap

< 5%

TDX overhead

EU

Sovereign jurisdiction

Why run LangChain inside Intel TDX

Standard LangChain pointed at OpenAI is great for prototyping. For production workloads in regulated industries, three risks make it unshippable.

Privileged data leaving the firm

Default LangChain pointed at OpenAI sends every retrieved chunk, every tool argument, and every generated output to a US controller. For privileged matter notes that is enough to break legal-professional privilege.

No Article 12 audit trail

OpenAI logs are not written to your audit sink. Article 12 of the EU AI Act requires automatic logs, retained for the system lifetime, that you control. With LangChain on a foreign API you do not get the inputs or outputs you need.

No hardware sealing

Standard cloud inference is "encrypted in transit, encrypted at rest" but plain RAM during compute. A privileged provider, malicious admin, or compromised hypervisor can read prompts. TDX removes that whole category.

VoltageGPU is an EU controller (VOLTAGE EI, France, SIREN 943 808 824) operating Intel TDX hardware enclaves with attestation evidence available per session. Pointing your existing LangChain stack at our endpoint replaces all three risks with a single hardware-sealed boundary.

Install

Standard langchain, langchain-openai, and langgraph from PyPI. No fork, no patch, no proxy.

Shell · pip install

BASH

# Standard LangChain stack — no fork, no patch
pip install langchain langchain-openai langgraph langchain-community
pip install pgvector psycopg[binary]   # optional, for confidential RAG

5-minute quickstart

Construct one ChatOpenAI instance pointed at https://app.voltagegpu.com/v1. Use it anywhere a ChatModel is expected — chains, agents, LangGraph nodes, retrieval QA, structured output. Streaming, tool calling, async batch all work identically.

Python · ChatOpenAI · drop-in confidential

PYTHON

# 5-minute confidential LangChain quickstart
# Swap two lines: base_url + api_key. Everything else is stock LangChain.
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# 1. Point the OpenAI-compatible client at the confidential endpoint.
llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key=os.environ["VOLTAGE_API_KEY"],   # vg-... from /settings/api-keys
    temperature=0.2,
    max_tokens=2048,
    timeout=60,
)

# 2. Use the chat model exactly like any other ChatOpenAI instance.
messages = [
    SystemMessage(content="You are a privacy-first legal assistant for a French law firm."),
    HumanMessage(content="Summarize the auto-renewal terms in this NDA: ..."),
]

reply = llm.invoke(messages)
print(reply.content)

# 3. Streaming works with .stream() and .astream() exactly like OpenAI.
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

LangChain.js / TypeScript

@langchain/openai takes the same configuration. Works in Node, Bun, Deno, and Edge runtimes.

TypeScript · @langchain/openai · drop-in confidential

TYPESCRIPT

// LangChain.js / @langchain/openai — drop-in confidential endpoint
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";

const llm = new ChatOpenAI({
  model: "Qwen/Qwen3-32B-TEE",
  apiKey: process.env.VOLTAGE_API_KEY,        // vg-...
  configuration: {
    baseURL: "https://app.voltagegpu.com/v1", // sovereign endpoint
  },
  temperature: 0.2,
  maxTokens: 2048,
});

const reply = await llm.invoke([
  new SystemMessage("You are a sovereign AI assistant."),
  new HumanMessage("Draft a redline for this auto-renewal clause..."),
]);

console.log(reply.content);

// Streaming
const stream = await llm.stream([
  new HumanMessage("Outline the GDPR Article 28 obligations for processors."),
]);
for await (const chunk of stream) process.stdout.write(chunk.content as string);

What works out of the box

The endpoint is OpenAI-compatible, so every LangChain primitive that talks to ChatOpenAI works without modification.

LCEL chains

Prompt | Model | OutputParser pipelines unchanged.

LangGraph

StateGraph nodes bind a confidential ChatOpenAI as their LLM.

Tool calling

bind_tools, create_react_agent, OpenAI tool schema all work.

Streaming

.stream() and .astream() over Server-Sent Events.

Retrievers

PGVector, Qdrant, Chroma, Weaviate keep your embeddings on-prem.

Async batch

abatch, abatch_as_completed for high-throughput pipelines.

LangGraph state graph — RAG with planner, writer, critic

Bind one confidential ChatOpenAI to every node. The whole graph (planner, retriever, writer, critic, finalizer) reasons against TDX-sealed inference. Conditional edges, checkpointing, and human-in-the-loop interrupts behave identically.

Python · LangGraph · 5-node sovereign graph

PYTHON

# LangGraph state graph — every node uses the confidential endpoint
from typing import TypedDict, List
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0.1,
)

class GraphState(TypedDict):
    question: str
    plan: str
    context: List[str]
    draft: str
    critique: str
    final: str

# --- Nodes ---

def planner(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Decompose the user question into a 3-step research plan."),
        HumanMessage(content=state["question"]),
    ])
    return {"plan": msg.content}

def researcher(state: GraphState) -> GraphState:
    # Plug in a private retriever here (PGVector / Qdrant on-prem).
    # For brevity we stub the retrieval step.
    return {"context": ["...private firm context retrieved on-prem..."]}

def writer(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Write a draft answer using only the provided context."),
        HumanMessage(content=f"Plan:\n{state['plan']}\n\nContext:\n{state['context']}"
                             f"\n\nQuestion: {state['question']}"),
    ])
    return {"draft": msg.content}

def critic(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Critique the draft for hallucination and missing citations."),
        HumanMessage(content=state["draft"]),
    ])
    return {"critique": msg.content}

def finalizer(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Produce the final answer, addressing the critique."),
        HumanMessage(content=f"Draft:\n{state['draft']}\n\nCritique:\n{state['critique']}"),
    ])
    return {"final": msg.content}

# --- Graph wiring ---
g = StateGraph(GraphState)
g.add_node("plan", planner)
g.add_node("research", researcher)
g.add_node("write", writer)
g.add_node("critique", critic)
g.add_node("finalize", finalizer)
g.add_edge(START, "plan")
g.add_edge("plan", "research")
g.add_edge("research", "write")
g.add_edge("write", "critique")
g.add_edge("critique", "finalize")
g.add_edge("finalize", END)

graph = g.compile(checkpointer=MemorySaver())

result = graph.invoke(
    {"question": "Draft a memo on cross-border data transfers under SCCs 2021/914."},
    config={"configurable": {"thread_id": "matter-2026-0419"}},
)
print(result["final"])

Tool calling — agent reasons in TDX, tools execute on your machine

create_react_agent and bind_tools work without changes. Tools run in your process — they can hit internal APIs without those endpoints being exposed to the confidential endpoint.

Python · LangGraph prebuilt · ReAct agent with private tools

PYTHON

# Tool calling — agent reasons in TDX, tools execute on your machine
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

@tool
def search_clauses(matter_id: str, query: str) -> str:
    """Search the firm's internal clause library for a given matter."""
    # Hits an internal API on your network — never leaves your VPC.
    return internal_clause_api(matter_id, query)

@tool
def jurisdiction_check(country: str, regulation: str) -> str:
    """Check whether a regulation applies in a given EU jurisdiction."""
    return internal_jurisdiction_db(country, regulation)

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0,
)

agent = create_react_agent(llm, [search_clauses, jurisdiction_check])

response = agent.invoke({
    "messages": [
        ("user",
         "For matter M-2026-0419, find all auto-renewal clauses and check whether "
         "the German implementation of EU Directive 2019/770 caps them.")
    ]
})
print(response["messages"][-1].content)

Streaming with .stream() / .astream()

Server-Sent Events streaming is identical to OpenAI. Use it for chat UIs, FastAPI StreamingResponse handlers, or Next.js Edge functions.

Python · streaming with SSE

PYTHON

# Server-Sent Events streaming — works the same as OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    streaming=True,
)

# Sync streaming
for chunk in llm.stream([HumanMessage(content="Explain the NIS2 incident-reporting timelines.")]):
    print(chunk.content, end="", flush=True)

# Async streaming for FastAPI / Starlette / Next.js Edge handlers
async def stream_response(question: str):
    async for chunk in llm.astream([HumanMessage(content=question)]):
        yield chunk.content

Vector store inside TDX — confidential RAG

For private RAG you have three production-ready paths. All three keep raw documents and embeddings out of any third-party SaaS vector database.

On-prem vector DB

Run PGVector, Qdrant, Chroma, or Weaviate inside your own VPC. Only the matched chunks transit the confidential endpoint for reasoning.

PGVectorQdrantChroma

Confidential embeddings

Use VoltageGPU -TEE embedding models so the embedding step itself runs inside Intel TDX. Useful when raw documents must never reach a SaaS embedder.

bge-m3-TEESealed embeddings

Hybrid retrieval

BM25 + vector hybrid retrieval is supported via standard LangChain Retrievers. The reranker can also run on the confidential endpoint.

HybridReranker

Python · LCEL · Confidential RAG with PGVector

PYTHON

# LCEL RAG over private documents — confidential end-to-end
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_postgres import PGVector

# Confidential LLM (reasoning sealed in Intel TDX)
llm = ChatOpenAI(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0,
)

# Confidential embeddings (vector math sealed in Intel TDX)
embeddings = OpenAIEmbeddings(
    model="bge-m3-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

# Self-hosted vector store inside your VPC — never touches a US SaaS.
vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="firm_matter_notes",
    connection="postgresql+psycopg://app:secret@db.internal:5432/rag",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer strictly from the retrieved firm matter notes. "
               "If the answer is not in the context, say so."),
    ("user", "Context:\n{context}\n\nQuestion: {question}"),
])

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke(
    "What is the latest position our firm took on auto-renewal caps in EU SaaS MSAs?"
)
print(answer)

Observability + logs — EU AI Act Article 12

Article 12 of the EU AI Act requires automatic event logs throughout the lifetime of high-risk AI systems. Combine VoltageGPU's structured per-request events with a LangChain callback handler to capture chain-level events, and you have a complete audit trail without sending raw prompts to a US cloud.

Python · LangChain BaseCallbackHandler

PYTHON

# EU AI Act Article 12 — structured logging hook for LangChain
from langchain_core.callbacks import BaseCallbackHandler
from datetime import datetime, timezone
import hashlib
import json

class ArticleTwelveLogger(BaseCallbackHandler):
    """Writes Article 12 compatible event logs to your audit pipeline."""

    def __init__(self, sink):
        self.sink = sink   # e.g. a write-only log bucket, OpenSearch, or Loki

    def _hash(self, text: str) -> str:
        return hashlib.sha256(text.encode("utf-8")).hexdigest()

    def on_llm_start(self, serialized, prompts, **kwargs):
        for p in prompts:
            self.sink.write(json.dumps({
                "ts": datetime.now(timezone.utc).isoformat(),
                "event": "llm_start",
                "model": serialized.get("kwargs", {}).get("model"),
                "prompt_sha256": self._hash(p),
                "prompt_chars": len(p),
                "run_id": str(kwargs.get("run_id")),
            }) + "\n")

    def on_llm_end(self, response, **kwargs):
        for gen in response.generations:
            for g in gen:
                self.sink.write(json.dumps({
                    "ts": datetime.now(timezone.utc).isoformat(),
                    "event": "llm_end",
                    "output_sha256": self._hash(g.text),
                    "output_chars": len(g.text),
                    "run_id": str(kwargs.get("run_id")),
                }) + "\n")

    def on_llm_error(self, error, **kwargs):
        self.sink.write(json.dumps({
            "ts": datetime.now(timezone.utc).isoformat(),
            "event": "llm_error",
            "error": str(error),
            "run_id": str(kwargs.get("run_id")),
        }) + "\n")

# Attach the logger to any LangChain invocation.
reply = llm.invoke(messages, config={"callbacks": [ArticleTwelveLogger(audit_sink)]})

See the full guide: Article 12 AI Act logging — retention windows, tamper-evident sinks, and the structured event schema we emit.

Pricing

Pay-per-token via the same /v1 API. No per-seat licence, no platform fee, no minimum. Mix models freely across nodes and chains.

Qwen/Qwen3-32B-TEEDefault. Worker nodes, ReAct agents, tool calls.

in $0.50 / 1Mout $1.50 / 1M

Qwen3-235B-A22B-Instruct-2507-TEELong-context (262K). Contract review, RAG synthesis.

in $1.20 / 1Mout $3.50 / 1M

DeepSeek-R1-0528-TEEReasoning. Critic nodes, IC memos, plan-critique loops.

in $1.80 / 1Mout $5.40 / 1M

bge-m3-TEEConfidential embeddings for sealed RAG.

in $0.10 / 1Mout —

Volume contracts available beyond 100M tokens / month. Annual prepay 15% off.

Why this is confidential

Every LangChain prompt sealed in TDX

The /v1/chat/completions endpoint terminates TLS inside the trust domain. Prompts decrypt only inside the enclave. The hypervisor cannot read them.

AES-256 memory encryption

CPU-fused keys protect RAM at runtime. Contracts, patient notes, deal models, embeddings — none of it readable by a privileged provider.

Per-request attestation

Each completion can be paired with an ECDSA-signed report identifying the TDX module and base model version. Verifiable proof of confidentiality on demand.

Zero retention, zero training

Prompts and completions are never logged or reused. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France, SIREN 943 808 824).

Developer FAQ

Do I need to fork LangChain?

No. LangChain ChatOpenAI is the supported integration point. Pass base_url=https://app.voltagegpu.com/v1 and api_key=vg-... and you are done. No SDK fork, no monkey patching, no proxy required.

Does LangGraph work?

Yes. Bind a confidential ChatOpenAI to each node. StateGraph, conditional edges, checkpointing, async streaming, and human-in-the-loop interrupts all behave identically.

Can my tools call internal APIs?

Yes. Tools execute on your machine; only the LLM reasoning step crosses the enclave boundary. If you also want tool calls to remain provider-blind, expose them through a confidential MCP server hosted on VoltageGPU.

Streaming and tool calling?

Both supported. The /v1/chat/completions endpoint mirrors OpenAI for SSE streaming and tool / function calling. bind_tools and create_react_agent work without modification.

How does this satisfy EU AI Act Article 12?

VoltageGPU emits structured per-request events (timestamp, model, input hash, output hash, attestation reference). Combined with LangChain LCEL callbacks, you get a complete Article 12 audit trail without sending raw prompts to a US cloud.

Latency overhead vs OpenAI direct?

Within ~5% of bare-metal inference on the same model. TDX adds single-digit microseconds per memory access, dominated by token generation time.

LangChain.js / TypeScript supported?

Yes. @langchain/openai accepts the same baseURL and apiKey configuration. Drop the URL into a ChatOpenAI constructor in Node, Bun, or Deno.

How do I get an API key?

Register at app.voltagegpu.com/register, top up any amount (Stripe, BTC, ETH, USDC), and generate a key from the API Keys page. The key is prefixed with vg- and acts as a drop-in OPENAI_API_KEY.

EXPLORE FURTHER

Bring Your Own Agent

Parent pillar — BYOA overview

CrewAI private deployment

Multi-agent crews in TDX

Confidential MCP server

Tool calls inside the enclave

Sovereign agentic AI

Architectural overview

Article 12 AI Act logging

Audit trail compliance

Legal AI agents

Legal-specific agent patterns

References:langchain-openai·LangGraph·EU AI Act Art. 12·BYOA pillar

Ship a sovereign LangChain stack this afternoon

Generate an API key, swap two lines, run your existing chains against Intel TDX.

Create account BYOA pillar

LangChain TEE Deployment — Deploy LangChain agents inside confidential VMs

Drop-in ChatOpenAI swap, sealed in Intel TDX, EU sovereign

Why LangChain inside Intel TDX

5-minute LangChain quickstart

Pricing

Related resources

Deploy LangChain agents inside
confidential VMs.

Why run LangChain inside Intel TDX

Install

5-minute quickstart

LangChain.js / TypeScript

What works out of the box

LangGraph state graph — RAG with planner, writer, critic

Tool calling — agent reasons in TDX, tools execute on your machine

Streaming with .stream() / .astream()

Vector store inside TDX — confidential RAG

Observability + logs — EU AI Act Article 12

Pricing

Why this is confidential

Developer FAQ

Deploy LangChain agents insideconfidential VMs.

Why run LangChain inside Intel TDX

Install

5-minute quickstart

LangChain.js / TypeScript

What works out of the box

LangGraph state graph — RAG with planner, writer, critic

Tool calling — agent reasons in TDX, tools execute on your machine

Streaming with .stream() / .astream()

Vector store inside TDX — confidential RAG

Observability + logs — EU AI Act Article 12

Pricing

Why this is confidential

Developer FAQ

Deploy LangChain agents inside
confidential VMs.