LangChain TEE Deployment
Sealed in Intel TDX
LANGCHAIN · LANGGRAPH · CONFIDENTIAL

Deploy LangChain agents inside
confidential VMs.

Drop-in ChatOpenAI swap. Every reasoning call routes through Intel TDX enclaves we operate in the EU. Existing LCEL chains, LangGraph state graphs, agents, retrievers, and tool calls run unchanged.

Built for legal RAG, financial analysis, healthcare summarization, and any pipeline that needs EU AI Act Article 12 logging without ever shipping privileged context to a US controller.

5 min
Quickstart
2 lines
To swap
< 5%
TDX overhead
EU
Sovereign jurisdiction

Why run LangChain inside Intel TDX

Standard LangChain pointed at OpenAI is great for prototyping. For production workloads in regulated industries, three risks make it unshippable.

Privileged data leaving the firm

Default LangChain pointed at OpenAI sends every retrieved chunk, every tool argument, and every generated output to a US controller. For privileged matter notes that is enough to break legal-professional privilege.

No Article 12 audit trail

OpenAI logs are not written to your audit sink. Article 12 of the EU AI Act requires automatic logs, retained for the system lifetime, that you control. With LangChain on a foreign API you do not get the inputs or outputs you need.

No hardware sealing

Standard cloud inference is "encrypted in transit, encrypted at rest" but plain RAM during compute. A privileged provider, malicious admin, or compromised hypervisor can read prompts. TDX removes that whole category.

VoltageGPU is an EU controller (VOLTAGE EI, France, SIREN 943 808 824) operating Intel TDX hardware enclaves with attestation evidence available per session. Pointing your existing LangChain stack at our endpoint replaces all three risks with a single hardware-sealed boundary.

Install

Standard langchain, langchain-openai, and langgraph from PyPI. No fork, no patch, no proxy.

Shell · pip install
BASH
# Standard LangChain stack — no fork, no patch
pip install langchain langchain-openai langgraph langchain-community
pip install pgvector psycopg[binary]   # optional, for confidential RAG

5-minute quickstart

Construct one ChatOpenAI instance pointed at https://app.voltagegpu.com/v1. Use it anywhere a ChatModel is expected — chains, agents, LangGraph nodes, retrieval QA, structured output. Streaming, tool calling, async batch all work identically.

Python · ChatOpenAI · drop-in confidential
PYTHON
# 5-minute confidential LangChain quickstart
# Swap two lines: base_url + api_key. Everything else is stock LangChain.
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# 1. Point the OpenAI-compatible client at the confidential endpoint.
llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key=os.environ["VOLTAGE_API_KEY"],   # vg-... from /settings/api-keys
    temperature=0.2,
    max_tokens=2048,
    timeout=60,
)

# 2. Use the chat model exactly like any other ChatOpenAI instance.
messages = [
    SystemMessage(content="You are a privacy-first legal assistant for a French law firm."),
    HumanMessage(content="Summarize the auto-renewal terms in this NDA: ..."),
]

reply = llm.invoke(messages)
print(reply.content)

# 3. Streaming works with .stream() and .astream() exactly like OpenAI.
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

LangChain.js / TypeScript

@langchain/openai takes the same configuration. Works in Node, Bun, Deno, and Edge runtimes.

TypeScript · @langchain/openai · drop-in confidential
TYPESCRIPT
// LangChain.js / @langchain/openai — drop-in confidential endpoint
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage, SystemMessage } from "@langchain/core/messages";

const llm = new ChatOpenAI({
  model: "Qwen/Qwen3-32B-TEE",
  apiKey: process.env.VOLTAGE_API_KEY,        // vg-...
  configuration: {
    baseURL: "https://app.voltagegpu.com/v1", // sovereign endpoint
  },
  temperature: 0.2,
  maxTokens: 2048,
});

const reply = await llm.invoke([
  new SystemMessage("You are a sovereign AI assistant."),
  new HumanMessage("Draft a redline for this auto-renewal clause..."),
]);

console.log(reply.content);

// Streaming
const stream = await llm.stream([
  new HumanMessage("Outline the GDPR Article 28 obligations for processors."),
]);
for await (const chunk of stream) process.stdout.write(chunk.content as string);

What works out of the box

The endpoint is OpenAI-compatible, so every LangChain primitive that talks to ChatOpenAI works without modification.

LCEL chains

Prompt | Model | OutputParser pipelines unchanged.

LangGraph

StateGraph nodes bind a confidential ChatOpenAI as their LLM.

Tool calling

bind_tools, create_react_agent, OpenAI tool schema all work.

Streaming

.stream() and .astream() over Server-Sent Events.

Retrievers

PGVector, Qdrant, Chroma, Weaviate keep your embeddings on-prem.

Async batch

abatch, abatch_as_completed for high-throughput pipelines.

LangGraph state graph — RAG with planner, writer, critic

Bind one confidential ChatOpenAI to every node. The whole graph (planner, retriever, writer, critic, finalizer) reasons against TDX-sealed inference. Conditional edges, checkpointing, and human-in-the-loop interrupts behave identically.

Python · LangGraph · 5-node sovereign graph
PYTHON
# LangGraph state graph — every node uses the confidential endpoint
from typing import TypedDict, List
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0.1,
)

class GraphState(TypedDict):
    question: str
    plan: str
    context: List[str]
    draft: str
    critique: str
    final: str

# --- Nodes ---

def planner(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Decompose the user question into a 3-step research plan."),
        HumanMessage(content=state["question"]),
    ])
    return {"plan": msg.content}

def researcher(state: GraphState) -> GraphState:
    # Plug in a private retriever here (PGVector / Qdrant on-prem).
    # For brevity we stub the retrieval step.
    return {"context": ["...private firm context retrieved on-prem..."]}

def writer(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Write a draft answer using only the provided context."),
        HumanMessage(content=f"Plan:\n{state['plan']}\n\nContext:\n{state['context']}"
                             f"\n\nQuestion: {state['question']}"),
    ])
    return {"draft": msg.content}

def critic(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Critique the draft for hallucination and missing citations."),
        HumanMessage(content=state["draft"]),
    ])
    return {"critique": msg.content}

def finalizer(state: GraphState) -> GraphState:
    msg = llm.invoke([
        SystemMessage(content="Produce the final answer, addressing the critique."),
        HumanMessage(content=f"Draft:\n{state['draft']}\n\nCritique:\n{state['critique']}"),
    ])
    return {"final": msg.content}

# --- Graph wiring ---
g = StateGraph(GraphState)
g.add_node("plan", planner)
g.add_node("research", researcher)
g.add_node("write", writer)
g.add_node("critique", critic)
g.add_node("finalize", finalizer)
g.add_edge(START, "plan")
g.add_edge("plan", "research")
g.add_edge("research", "write")
g.add_edge("write", "critique")
g.add_edge("critique", "finalize")
g.add_edge("finalize", END)

graph = g.compile(checkpointer=MemorySaver())

result = graph.invoke(
    {"question": "Draft a memo on cross-border data transfers under SCCs 2021/914."},
    config={"configurable": {"thread_id": "matter-2026-0419"}},
)
print(result["final"])

Tool calling — agent reasons in TDX, tools execute on your machine

create_react_agent and bind_tools work without changes. Tools run in your process — they can hit internal APIs without those endpoints being exposed to the confidential endpoint.

Python · LangGraph prebuilt · ReAct agent with private tools
PYTHON
# Tool calling — agent reasons in TDX, tools execute on your machine
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

@tool
def search_clauses(matter_id: str, query: str) -> str:
    """Search the firm's internal clause library for a given matter."""
    # Hits an internal API on your network — never leaves your VPC.
    return internal_clause_api(matter_id, query)

@tool
def jurisdiction_check(country: str, regulation: str) -> str:
    """Check whether a regulation applies in a given EU jurisdiction."""
    return internal_jurisdiction_db(country, regulation)

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0,
)

agent = create_react_agent(llm, [search_clauses, jurisdiction_check])

response = agent.invoke({
    "messages": [
        ("user",
         "For matter M-2026-0419, find all auto-renewal clauses and check whether "
         "the German implementation of EU Directive 2019/770 caps them.")
    ]
})
print(response["messages"][-1].content)

Streaming with .stream() / .astream()

Server-Sent Events streaming is identical to OpenAI. Use it for chat UIs, FastAPI StreamingResponse handlers, or Next.js Edge functions.

Python · streaming with SSE
PYTHON
# Server-Sent Events streaming — works the same as OpenAI
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="Qwen/Qwen3-32B-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    streaming=True,
)

# Sync streaming
for chunk in llm.stream([HumanMessage(content="Explain the NIS2 incident-reporting timelines.")]):
    print(chunk.content, end="", flush=True)

# Async streaming for FastAPI / Starlette / Next.js Edge handlers
async def stream_response(question: str):
    async for chunk in llm.astream([HumanMessage(content=question)]):
        yield chunk.content

Vector store inside TDX — confidential RAG

For private RAG you have three production-ready paths. All three keep raw documents and embeddings out of any third-party SaaS vector database.

On-prem vector DB

Run PGVector, Qdrant, Chroma, or Weaviate inside your own VPC. Only the matched chunks transit the confidential endpoint for reasoning.

PGVectorQdrantChroma

Confidential embeddings

Use VoltageGPU -TEE embedding models so the embedding step itself runs inside Intel TDX. Useful when raw documents must never reach a SaaS embedder.

bge-m3-TEESealed embeddings

Hybrid retrieval

BM25 + vector hybrid retrieval is supported via standard LangChain Retrievers. The reranker can also run on the confidential endpoint.

HybridReranker
Python · LCEL · Confidential RAG with PGVector
PYTHON
# LCEL RAG over private documents — confidential end-to-end
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_postgres import PGVector

# Confidential LLM (reasoning sealed in Intel TDX)
llm = ChatOpenAI(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
    temperature=0,
)

# Confidential embeddings (vector math sealed in Intel TDX)
embeddings = OpenAIEmbeddings(
    model="bge-m3-TEE",
    base_url="https://app.voltagegpu.com/v1",
    api_key="vg-...",
)

# Self-hosted vector store inside your VPC — never touches a US SaaS.
vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="firm_matter_notes",
    connection="postgresql+psycopg://app:secret@db.internal:5432/rag",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer strictly from the retrieved firm matter notes. "
               "If the answer is not in the context, say so."),
    ("user", "Context:\n{context}\n\nQuestion: {question}"),
])

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke(
    "What is the latest position our firm took on auto-renewal caps in EU SaaS MSAs?"
)
print(answer)

Observability + logs — EU AI Act Article 12

Article 12 of the EU AI Act requires automatic event logs throughout the lifetime of high-risk AI systems. Combine VoltageGPU's structured per-request events with a LangChain callback handler to capture chain-level events, and you have a complete audit trail without sending raw prompts to a US cloud.

Python · LangChain BaseCallbackHandler
PYTHON
# EU AI Act Article 12 — structured logging hook for LangChain
from langchain_core.callbacks import BaseCallbackHandler
from datetime import datetime, timezone
import hashlib
import json

class ArticleTwelveLogger(BaseCallbackHandler):
    """Writes Article 12 compatible event logs to your audit pipeline."""

    def __init__(self, sink):
        self.sink = sink   # e.g. a write-only log bucket, OpenSearch, or Loki

    def _hash(self, text: str) -> str:
        return hashlib.sha256(text.encode("utf-8")).hexdigest()

    def on_llm_start(self, serialized, prompts, **kwargs):
        for p in prompts:
            self.sink.write(json.dumps({
                "ts": datetime.now(timezone.utc).isoformat(),
                "event": "llm_start",
                "model": serialized.get("kwargs", {}).get("model"),
                "prompt_sha256": self._hash(p),
                "prompt_chars": len(p),
                "run_id": str(kwargs.get("run_id")),
            }) + "\n")

    def on_llm_end(self, response, **kwargs):
        for gen in response.generations:
            for g in gen:
                self.sink.write(json.dumps({
                    "ts": datetime.now(timezone.utc).isoformat(),
                    "event": "llm_end",
                    "output_sha256": self._hash(g.text),
                    "output_chars": len(g.text),
                    "run_id": str(kwargs.get("run_id")),
                }) + "\n")

    def on_llm_error(self, error, **kwargs):
        self.sink.write(json.dumps({
            "ts": datetime.now(timezone.utc).isoformat(),
            "event": "llm_error",
            "error": str(error),
            "run_id": str(kwargs.get("run_id")),
        }) + "\n")

# Attach the logger to any LangChain invocation.
reply = llm.invoke(messages, config={"callbacks": [ArticleTwelveLogger(audit_sink)]})

See the full guide: Article 12 AI Act logging — retention windows, tamper-evident sinks, and the structured event schema we emit.

Pricing

Pay-per-token via the same /v1 API. No per-seat licence, no platform fee, no minimum. Mix models freely across nodes and chains.

Qwen/Qwen3-32B-TEEDefault. Worker nodes, ReAct agents, tool calls.
in $0.50 / 1Mout $1.50 / 1M
Qwen3-235B-A22B-Instruct-2507-TEELong-context (262K). Contract review, RAG synthesis.
in $1.20 / 1Mout $3.50 / 1M
DeepSeek-R1-0528-TEEReasoning. Critic nodes, IC memos, plan-critique loops.
in $1.80 / 1Mout $5.40 / 1M
bge-m3-TEEConfidential embeddings for sealed RAG.
in $0.10 / 1Mout

Volume contracts available beyond 100M tokens / month. Annual prepay 15% off.

Why this is confidential

Every LangChain prompt sealed in TDX

The /v1/chat/completions endpoint terminates TLS inside the trust domain. Prompts decrypt only inside the enclave. The hypervisor cannot read them.

AES-256 memory encryption

CPU-fused keys protect RAM at runtime. Contracts, patient notes, deal models, embeddings — none of it readable by a privileged provider.

Per-request attestation

Each completion can be paired with an ECDSA-signed report identifying the TDX module and base model version. Verifiable proof of confidentiality on demand.

Zero retention, zero training

Prompts and completions are never logged or reused. Native RGPD Article 28 DPA, EU jurisdiction (VOLTAGE EI, France, SIREN 943 808 824).

Developer FAQ

Do I need to fork LangChain?

No. LangChain ChatOpenAI is the supported integration point. Pass base_url=https://app.voltagegpu.com/v1 and api_key=vg-... and you are done. No SDK fork, no monkey patching, no proxy required.

Does LangGraph work?

Yes. Bind a confidential ChatOpenAI to each node. StateGraph, conditional edges, checkpointing, async streaming, and human-in-the-loop interrupts all behave identically.

Can my tools call internal APIs?

Yes. Tools execute on your machine; only the LLM reasoning step crosses the enclave boundary. If you also want tool calls to remain provider-blind, expose them through a confidential MCP server hosted on VoltageGPU.

Streaming and tool calling?

Both supported. The /v1/chat/completions endpoint mirrors OpenAI for SSE streaming and tool / function calling. bind_tools and create_react_agent work without modification.

How does this satisfy EU AI Act Article 12?

VoltageGPU emits structured per-request events (timestamp, model, input hash, output hash, attestation reference). Combined with LangChain LCEL callbacks, you get a complete Article 12 audit trail without sending raw prompts to a US cloud.

Latency overhead vs OpenAI direct?

Within ~5% of bare-metal inference on the same model. TDX adds single-digit microseconds per memory access, dominated by token generation time.

LangChain.js / TypeScript supported?

Yes. @langchain/openai accepts the same baseURL and apiKey configuration. Drop the URL into a ChatOpenAI constructor in Node, Bun, or Deno.

How do I get an API key?

Register at app.voltagegpu.com/register, top up any amount (Stripe, BTC, ETH, USDC), and generate a key from the API Keys page. The key is prefixed with vg- and acts as a drop-in OPENAI_API_KEY.

EXPLORE FURTHER

Bring Your Own Agent

Parent pillar — BYOA overview

CrewAI private deployment

Multi-agent crews in TDX

Confidential MCP server

Tool calls inside the enclave

Sovereign agentic AI

Architectural overview

Article 12 AI Act logging

Audit trail compliance

Legal AI agents

Legal-specific agent patterns

References:langchain-openai·LangGraph·EU AI Act Art. 12·BYOA pillar

Ship a sovereign LangChain stack this afternoon

Generate an API key, swap two lines, run your existing chains against Intel TDX.

Create accountBYOA pillar