Why not just use OpenAI?
If your customers' contracts don't block it and your data isn't sensitive, OpenAI is often the right answer. I'll tell you that on the scope call if it applies. This offer exists for teams where 'just use OpenAI' isn't an option — GDPR, customer subprocessor lists, regulated industry, or data your team isn't comfortable streaming to a US vendor.
What if we don't have GPUs?
Hetzner GPU instances are the default recommendation — EU-hosted, monthly pricing, cancellable. The audit costs out the right GPU SKU for your workload. Client-owned hardware is supported with a small build surcharge.
What if our team can't run K8s after handover?
Handover includes a 10-scenario runbook and 90 days of async follow-up. After that, an optional retainer covers 4 hours/month for ops questions. You won't get locked in — the IaC lives in your repo, the cluster is in your account, you can move it or fire me at any time.
What models can you actually run?
Two bands. Self-hosted open weights — Llama, Mistral, Qwen, DeepSeek, Phi, Gemma, Command R, Mixtral — anything with downloadable weights and a permissive license, served via vLLM or Ollama. The specific model gets picked in the audit based on your workload and budget. Closed-weight commercial models (Claude, the proprietary Bedrock catalogue) aren't self-hostable, but I wire your existing Bedrock or Anthropic-direct subscription through the same OpenAI-compatible gateway — you get one unified API, one audit log, and self-hosting for the parts that can be self-hosted.
What if we already have a cluster?
The audit absorbs your existing cluster as Phase 0. The Build either hardens what you have or rebuilds it cleanly — whichever the audit recommends, with your input. We don't bin work that's already done.
How is this different from Cohere, Together, Replicate?
They run inference for you on their cluster. You're getting your own cluster, in your AWS, Hetzner, or colo account, that you own. They're a service. This is a build.