Skip to main content

Architecture

Majordomo architecture diagram showing the Steward running in the customer VPC, writing prompts and completions to S3/GCS, and sending metadata to Majordomo Cloud Request and response bodies are written directly to your S3 or GCS bucket. Majordomo’s servers receive only metadata — token counts, cost, latency, model name, and whatever custom tags you attach. This is not a configuration option or a compliance mode. It is how the product is built. For a glossary of roles and responsibilities, see Components.

Two deployment modes

Managed

Majordomo operates Steward on its own infrastructure. You connect your cloud storage bucket, create an API key, and point your SDK at the gateway endpoint. No servers to run or maintain. Managed Cloud deployment: Majordomo runs the Steward, customer data goes to their S3/GCS bucket

Self-hosted Steward (VPC)

You run Steward inside your own VPC. Your prompts and completions are processed entirely within your network — they never touch Majordomo’s infrastructure. Only metadata (token counts, cost, latency, model name) leaves your environment, sent to Majordomo Cloud to power the dashboard. Self-hosted Steward deployment: customer runs the Steward in their VPC, writing prompts and completions to their own S3/GCS bucket This is the right choice when your team has data residency requirements, when enterprise customers ask where their data is processed, or when you need to pass a security review that requires prompt content to stay on-premises. Both modes write request/response bodies to your bucket. The difference is where Steward runs. Self-hosted setup →

Request flow

On every request, the gateway:
  1. Validates the X-Majordomo-Key header
  2. Detects the provider from the request path or X-Majordomo-Provider header
  3. Forwards the request to the upstream provider unchanged
  4. Parses the response for token usage
  5. Calculates cost using real-time pricing data
  6. Writes the request and response body to your S3 / GCS bucket
  7. Logs metadata to Majordomo asynchronously — no latency added to the critical path
  8. Returns the response to the caller — identical to calling the provider directly

What goes where

DataDestinationWho controls it
Prompt contentYour S3 / GCS bucketYou
Completion contentYour S3 / GCS bucketYou
Token countsMajordomo CloudMajordomo
CostMajordomo Cloud (calculated locally, sent as a number)Majordomo
LatencyMajordomo CloudMajordomo
Model nameMajordomo CloudMajordomo
Custom tagsMajordomo Cloud (only X-Majordomo-* headers you add)You decide what to tag
Provider API keysYour gateway database, encrypted at restYou

Provider detection

The gateway auto-detects the provider from the request path:
PathProvider
/v1/chat/completionsOpenAI
/v1/messagesAnthropic
/<model>:generateContentGemini
Override with the X-Majordomo-Provider header when needed.

Pricing

Costs are calculated using pricing data fetched hourly from llm-prices.com, with a bundled fallback. Provider model names are mapped to canonical names before lookup. Prompt caching tokens are tracked and priced separately.