VoltageGPU runs LangChain agents, chains, and LangGraph state graphs against confidential inference sealed in Intel TDX hardware enclaves. Existing LangChain code (LCEL chains, agents, retrievers, tool-calling, streaming) works unchanged. Construct a ChatOpenAI instance with base_url https://app.voltagegpu.com/v1 and a TEE model (Qwen/Qwen3-32B-TEE, Qwen3-235B-A22B-Instruct-2507-TEE, DeepSeek-R1-0528-TEE) and every reasoning step routes through a sovereign enclave operated in the EU by VOLTAGE EI (France, SIREN 943 808 824).
Install langchain-openai, set OPENAI_API_KEY to your VoltageGPU key, set the base_url to https://app.voltagegpu.com/v1, pick a -TEE model, and your agents are confidential. LangGraph nodes, LCEL chains, ChatPromptTemplate, output parsers, retrievers, bind_tools, astream, and async batch all work without modification.
Pay-per-token. Qwen3-32B-TEE at $0.50 / 1M input and $1.50 / 1M output. Qwen3-235B-TEE at $1.20 / 1M input and $3.50 / 1M output. DeepSeek-R1-0528-TEE at $1.80 / 1M input and $5.40 / 1M output. No per-seat licence, no platform fee. Volume contracts available beyond 100M tokens per month.
Drop-in ChatOpenAI swap. Every reasoning call routes through Intel TDX enclaves we operate in the EU. Existing LCEL chains, LangGraph state graphs, agents, retrievers, and tool calls run unchanged.
Built for legal RAG, financial analysis, healthcare summarization, and any pipeline that needs EU AI Act Article 12 logging without ever shipping privileged context to a US controller.
Standard LangChain pointed at OpenAI is great for prototyping. For production workloads in regulated industries, three risks make it unshippable.
Privileged data leaving the firm
Default LangChain pointed at OpenAI sends every retrieved chunk, every tool argument, and every generated output to a US controller. For privileged matter notes that is enough to break legal-professional privilege.
No Article 12 audit trail
OpenAI logs are not written to your audit sink. Article 12 of the EU AI Act requires automatic logs, retained for the system lifetime, that you control. With LangChain on a foreign API you do not get the inputs or outputs you need.
No hardware sealing
Standard cloud inference is "encrypted in transit, encrypted at rest" but plain RAM during compute. A privileged provider, malicious admin, or compromised hypervisor can read prompts. TDX removes that whole category.
VoltageGPU is an EU controller (VOLTAGE EI, France, SIREN 943 808 824) operating Intel TDX hardware enclaves with attestation evidence available per session. Pointing your existing LangChain stack at our endpoint replaces all three risks with a single hardware-sealed boundary.
Standard langchain, langchain-openai, and langgraph from PyPI. No fork, no patch, no proxy.
# Standard LangChain stack — no fork, no patch
pip install langchain langchain-openai langgraph langchain-community
pip install pgvector psycopg[binary] # optional, for confidential RAGConstruct one ChatOpenAI instance pointed at https://app.voltagegpu.com/v1. Use it anywhere a ChatModel is expected — chains, agents, LangGraph nodes, retrieval QA, structured output. Streaming, tool calling, async batch all work identically.
# 5-minute confidential LangChain quickstart
# Swap two lines: base_url + api_key. Everything else is stock LangChain.
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
# 1. Point the OpenAI-compatible client at the confidential endpoint.
llm = ChatOpenAI(
model="Qwen/Qwen3-32B-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key=os.environ["VOLTAGE_API_KEY"], # vg-... from /settings/api-keys
temperature=0.2,
max_tokens=2048,
timeout=60,
)
# 2. Use the chat model exactly like any other ChatOpenAI instance.
messages = [
SystemMessage(content="You are a privacy-first legal assistant for a French law firm."),
HumanMessage(content="Summarize the auto-renewal terms in this NDA: ..."),
]
reply = llm.invoke(messages)
print(reply.content)
# 3. Streaming works with .stream() and .astream() exactly like OpenAI.
for chunk in llm.stream(messages):
print(chunk.content, end="", flush=True)@langchain/openai takes the same configuration. Works in Node, Bun, Deno, and Edge runtimes.
// LangChain.js / @langchain/openai — drop-in confidential endpoint
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";
const llm = new ChatOpenAI({
model: "Qwen/Qwen3-32B-TEE",
apiKey: process.env.VOLTAGE_API_KEY, // vg-...
configuration: {
baseURL: "https://app.voltagegpu.com/v1", // sovereign endpoint
},
temperature: 0.2,
maxTokens: 2048,
});
const reply = await llm.invoke([
new SystemMessage("You are a sovereign AI assistant."),
new HumanMessage("Draft a redline for this auto-renewal clause..."),
]);
console.log(reply.content);
// Streaming
const stream = await llm.stream([
new HumanMessage("Outline the GDPR Article 28 obligations for processors."),
]);
for await (const chunk of stream) process.stdout.write(chunk.content as string);The endpoint is OpenAI-compatible, so every LangChain primitive that talks to ChatOpenAI works without modification.
LCEL chains
Prompt | Model | OutputParser pipelines unchanged.
LangGraph
StateGraph nodes bind a confidential ChatOpenAI as their LLM.
Tool calling
bind_tools, create_react_agent, OpenAI tool schema all work.
Streaming
.stream() and .astream() over Server-Sent Events.
Retrievers
PGVector, Qdrant, Chroma, Weaviate keep your embeddings on-prem.
Async batch
abatch, abatch_as_completed for high-throughput pipelines.
Bind one confidential ChatOpenAI to every node. The whole graph (planner, retriever, writer, critic, finalizer) reasons against TDX-sealed inference. Conditional edges, checkpointing, and human-in-the-loop interrupts behave identically.
# LangGraph state graph — every node uses the confidential endpoint
from typing import TypedDict, List
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
llm = ChatOpenAI(
model="Qwen/Qwen3-32B-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
temperature=0.1,
)
class GraphState(TypedDict):
question: str
plan: str
context: List[str]
draft: str
critique: str
final: str
# --- Nodes ---
def planner(state: GraphState) -> GraphState:
msg = llm.invoke([
SystemMessage(content="Decompose the user question into a 3-step research plan."),
HumanMessage(content=state["question"]),
])
return {"plan": msg.content}
def researcher(state: GraphState) -> GraphState:
# Plug in a private retriever here (PGVector / Qdrant on-prem).
# For brevity we stub the retrieval step.
return {"context": ["...private firm context retrieved on-prem..."]}
def writer(state: GraphState) -> GraphState:
msg = llm.invoke([
SystemMessage(content="Write a draft answer using only the provided context."),
HumanMessage(content=f"Plan:\n{state['plan']}\n\nContext:\n{state['context']}"
f"\n\nQuestion: {state['question']}"),
])
return {"draft": msg.content}
def critic(state: GraphState) -> GraphState:
msg = llm.invoke([
SystemMessage(content="Critique the draft for hallucination and missing citations."),
HumanMessage(content=state["draft"]),
])
return {"critique": msg.content}
def finalizer(state: GraphState) -> GraphState:
msg = llm.invoke([
SystemMessage(content="Produce the final answer, addressing the critique."),
HumanMessage(content=f"Draft:\n{state['draft']}\n\nCritique:\n{state['critique']}"),
])
return {"final": msg.content}
# --- Graph wiring ---
g = StateGraph(GraphState)
g.add_node("plan", planner)
g.add_node("research", researcher)
g.add_node("write", writer)
g.add_node("critique", critic)
g.add_node("finalize", finalizer)
g.add_edge(START, "plan")
g.add_edge("plan", "research")
g.add_edge("research", "write")
g.add_edge("write", "critique")
g.add_edge("critique", "finalize")
g.add_edge("finalize", END)
graph = g.compile(checkpointer=MemorySaver())
result = graph.invoke(
{"question": "Draft a memo on cross-border data transfers under SCCs 2021/914."},
config={"configurable": {"thread_id": "matter-2026-0419"}},
)
print(result["final"])create_react_agent and bind_tools work without changes. Tools run in your process — they can hit internal APIs without those endpoints being exposed to the confidential endpoint.
# Tool calling — agent reasons in TDX, tools execute on your machine
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
@tool
def search_clauses(matter_id: str, query: str) -> str:
"""Search the firm's internal clause library for a given matter."""
# Hits an internal API on your network — never leaves your VPC.
return internal_clause_api(matter_id, query)
@tool
def jurisdiction_check(country: str, regulation: str) -> str:
"""Check whether a regulation applies in a given EU jurisdiction."""
return internal_jurisdiction_db(country, regulation)
llm = ChatOpenAI(
model="Qwen/Qwen3-32B-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
temperature=0,
)
agent = create_react_agent(llm, [search_clauses, jurisdiction_check])
response = agent.invoke({
"messages": [
("user",
"For matter M-2026-0419, find all auto-renewal clauses and check whether "
"the German implementation of EU Directive 2019/770 caps them.")
]
})
print(response["messages"][-1].content)Server-Sent Events streaming is identical to OpenAI. Use it for chat UIs, FastAPI StreamingResponse handlers, or Next.js Edge functions.
# Server-Sent Events streaming — works the same as OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(
model="Qwen/Qwen3-32B-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
streaming=True,
)
# Sync streaming
for chunk in llm.stream([HumanMessage(content="Explain the NIS2 incident-reporting timelines.")]):
print(chunk.content, end="", flush=True)
# Async streaming for FastAPI / Starlette / Next.js Edge handlers
async def stream_response(question: str):
async for chunk in llm.astream([HumanMessage(content=question)]):
yield chunk.contentFor private RAG you have three production-ready paths. All three keep raw documents and embeddings out of any third-party SaaS vector database.
On-prem vector DB
Run PGVector, Qdrant, Chroma, or Weaviate inside your own VPC. Only the matched chunks transit the confidential endpoint for reasoning.
Confidential embeddings
Use VoltageGPU -TEE embedding models so the embedding step itself runs inside Intel TDX. Useful when raw documents must never reach a SaaS embedder.
Hybrid retrieval
BM25 + vector hybrid retrieval is supported via standard LangChain Retrievers. The reranker can also run on the confidential endpoint.
# LCEL RAG over private documents — confidential end-to-end
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_postgres import PGVector
# Confidential LLM (reasoning sealed in Intel TDX)
llm = ChatOpenAI(
model="Qwen/Qwen3-235B-A22B-Instruct-2507-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
temperature=0,
)
# Confidential embeddings (vector math sealed in Intel TDX)
embeddings = OpenAIEmbeddings(
model="bge-m3-TEE",
base_url="https://app.voltagegpu.com/v1",
api_key="vg-...",
)
# Self-hosted vector store inside your VPC — never touches a US SaaS.
vectorstore = PGVector(
embeddings=embeddings,
collection_name="firm_matter_notes",
connection="postgresql+psycopg://app:secret@db.internal:5432/rag",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})
prompt = ChatPromptTemplate.from_messages([
("system", "Answer strictly from the retrieved firm matter notes. "
"If the answer is not in the context, say so."),
("user", "Context:\n{context}\n\nQuestion: {question}"),
])
def format_docs(docs):
return "\n\n".join(d.page_content for d in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
answer = rag_chain.invoke(
"What is the latest position our firm took on auto-renewal caps in EU SaaS MSAs?"
)
print(answer)Article 12 of the EU AI Act requires automatic event logs throughout the lifetime of high-risk AI systems. Combine VoltageGPU's structured per-request events with a LangChain callback handler to capture chain-level events, and you have a complete audit trail without sending raw prompts to a US cloud.
# EU AI Act Article 12 — structured logging hook for LangChain
from langchain_core.callbacks import BaseCallbackHandler
from datetime import datetime, timezone
import hashlib
import json
class ArticleTwelveLogger(BaseCallbackHandler):
"""Writes Article 12 compatible event logs to your audit pipeline."""
def __init__(self, sink):
self.sink = sink # e.g. a write-only log bucket, OpenSearch, or Loki
def _hash(self, text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
def on_llm_start(self, serialized, prompts, **kwargs):
for p in prompts:
self.sink.write(json.dumps({
"ts": datetime.now(timezone.utc).isoformat(),
"event": "llm_start",
"model": serialized.get("kwargs", {}).get("model"),
"prompt_sha256": self._hash(p),
"prompt_chars": len(p),
"run_id": str(kwargs.get("run_id")),
}) + "\n")
def on_llm_end(self, response, **kwargs):
for gen in response.generations:
for g in gen:
self.sink.write(json.dumps({
"ts": datetime.now(timezone.utc).isoformat(),
"event": "llm_end",
"output_sha256": self._hash(g.text),
"output_chars": len(g.text),
"run_id": str(kwargs.get("run_id")),
}) + "\n")
def on_llm_error(self, error, **kwargs):
self.sink.write(json.dumps({
"ts": datetime.now(timezone.utc).isoformat(),
"event": "llm_error",
"error": str(error),
"run_id": str(kwargs.get("run_id")),
}) + "\n")
# Attach the logger to any LangChain invocation.
reply = llm.invoke(messages, config={"callbacks": [ArticleTwelveLogger(audit_sink)]})See the full guide: Article 12 AI Act logging — retention windows, tamper-evident sinks, and the structured event schema we emit.
Pay-per-token via the same /v1 API. No per-seat licence, no platform fee, no minimum. Mix models freely across nodes and chains.
Volume contracts available beyond 100M tokens / month. Annual prepay 15% off.
Every LangChain prompt sealed in TDX
The /v1/chat/completions endpoint terminates TLS inside the trust domain. Prompts decrypt only inside the enclave. The hypervisor cannot read them.
AES-256 memory encryption
CPU-fused keys protect RAM at runtime. Contracts, patient notes, deal models, embeddings — none of it readable by a privileged provider.
Per-request attestation
Each completion can be paired with an ECDSA-signed report identifying the TDX module and base model version. Verifiable proof of confidentiality on demand.
Zero retention, zero training
Prompts and completions are never logged or reused. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France, SIREN 943 808 824).
Do I need to fork LangChain?
No. LangChain ChatOpenAI is the supported integration point. Pass base_url=https://app.voltagegpu.com/v1 and api_key=vg-... and you are done. No SDK fork, no monkey patching, no proxy required.
Does LangGraph work?
Yes. Bind a confidential ChatOpenAI to each node. StateGraph, conditional edges, checkpointing, async streaming, and human-in-the-loop interrupts all behave identically.
Can my tools call internal APIs?
Yes. Tools execute on your machine; only the LLM reasoning step crosses the enclave boundary. If you also want tool calls to remain provider-blind, expose them through a confidential MCP server hosted on VoltageGPU.
Streaming and tool calling?
Both supported. The /v1/chat/completions endpoint mirrors OpenAI for SSE streaming and tool / function calling. bind_tools and create_react_agent work without modification.
How does this satisfy EU AI Act Article 12?
VoltageGPU emits structured per-request events (timestamp, model, input hash, output hash, attestation reference). Combined with LangChain LCEL callbacks, you get a complete Article 12 audit trail without sending raw prompts to a US cloud.
Latency overhead vs OpenAI direct?
Within ~5% of bare-metal inference on the same model. TDX adds single-digit microseconds per memory access, dominated by token generation time.
LangChain.js / TypeScript supported?
Yes. @langchain/openai accepts the same baseURL and apiKey configuration. Drop the URL into a ChatOpenAI constructor in Node, Bun, or Deno.
How do I get an API key?
Register at app.voltagegpu.com/register, top up any amount (Stripe, BTC, ETH, USDC), and generate a key from the API Keys page. The key is prefixed with vg- and acts as a drop-in OPENAI_API_KEY.
EXPLORE FURTHER
Ship a sovereign LangChain stack this afternoon
Generate an API key, swap two lines, run your existing chains against Intel TDX.