How It Works

Architecture

Request and response bodies are written directly to your S3 or GCS bucket. Majordomo’s servers receive only metadata — token counts, cost, latency, model name, and whatever custom tags you attach. This is not a configuration option or a compliance mode. It is how the product is built. For a glossary of roles and responsibilities, see Components.

Two deployment modes

Managed

Majordomo operates Steward on its own infrastructure. You connect your cloud storage bucket, create an API key, and point your SDK at the gateway endpoint. No servers to run or maintain. Managed Cloud deployment: Majordomo runs the Steward, customer data goes to their S3/GCS bucket

Self-hosted Steward (VPC)

You run Steward inside your own VPC. Your prompts and completions are processed entirely within your network — they never touch Majordomo’s infrastructure. Only metadata (token counts, cost, latency, model name) leaves your environment, sent to Majordomo Cloud to power the dashboard. Self-hosted Steward deployment: customer runs the Steward in their VPC, writing prompts and completions to their own S3/GCS bucket

Self-hosted Steward deployment: customer runs the Steward in their VPC, writing prompts and completions to their own S3/GCS bucket

This is the right choice when your team has data residency requirements, when enterprise customers ask where their data is processed, or when you need to pass a security review that requires prompt content to stay on-premises. Both modes write request/response bodies to your bucket. The difference is where Steward runs. Self-hosted setup →

Request flow

On every request, the gateway:

Validates the X-Majordomo-Key header
Detects the provider from the request path or X-Majordomo-Provider header
Forwards the request to the upstream provider unchanged
Parses the response for token usage
Calculates cost using real-time pricing data
Writes the request and response body to your S3 / GCS bucket
Logs metadata to Majordomo asynchronously — no latency added to the critical path
Returns the response to the caller — identical to calling the provider directly

What goes where

Data	Destination	Who controls it
Prompt content	Your S3 / GCS bucket	You
Completion content	Your S3 / GCS bucket	You
Token counts	Majordomo Cloud	Majordomo
Cost	Majordomo Cloud (calculated locally, sent as a number)	Majordomo
Latency	Majordomo Cloud	Majordomo
Model name	Majordomo Cloud	Majordomo
Custom tags	Majordomo Cloud (only `X-Majordomo-*` headers you add)	You decide what to tag
Provider API keys	Your gateway database, encrypted at rest	You

Provider detection

The gateway auto-detects the provider from the request path:

Path	Provider
`/v1/chat/completions`	OpenAI
`/v1/messages`	Anthropic
`/<model>:generateContent`	Gemini

Override with the X-Majordomo-Provider header when needed.

Pricing

Costs are calculated using pricing data fetched hourly from llm-prices.com, with a bundled fallback. Provider model names are mapped to canonical names before lookup. Prompt caching tokens are tracked and priced separately.

Getting Started

Connect Your Code

Track & Analyze

Control

Enterprise

Reference

How It Works

Architecture

Two deployment modes

Managed

Self-hosted Steward (VPC)

Request flow

What goes where

Provider detection

Pricing

Getting Started

How It Works

Connect Your Code

Track & Analyze

Control

Enterprise

Reference

​Architecture

​Two deployment modes

​Managed

​Self-hosted Steward (VPC)

​Request flow

​What goes where

​Provider detection

​Pricing

Architecture

Two deployment modes

Managed

Self-hosted Steward (VPC)

Request flow

What goes where

Provider detection

Pricing