AI cost control: pick your Claude model per tenant

AI features

Why model selection matters.

Mobieus turned on AI features last week: long-thread summaries, suggested tags, search synthesis, and one-click report explanations for moderators. The features hit a single model picked by us at the platform level. That was fine for a demo. It was wrong for production. The choice between Haiku, Sonnet, and Opus is a 15x cost spread, and there is no universal right answer for a tenant.

A 200-member hobby community can run AI summaries on Haiku and never notice the bill. A 20,000-member trade association running daily moderation explanations on Opus will burn through a four-figure monthly Anthropic invoice before the first weekly digest goes out. The same code is right in both cases, with different model defaults.

So we moved the decision to the tenant.

What shipped

A dropdown in /admin/ai.

Three options today: Claude Haiku, Claude Sonnet, Claude Opus. Each option includes the pricing label inline so you do not have to look it up. The default is Haiku at $1 per million input tokens and $5 per million output tokens. Sonnet is $3/$15. Opus is $15/$75. Save the dropdown and every AI feature on your tenant switches model on the next call. No restart, no cache flush.

A 30-day spend dashboard.

Right below the dropdown is a spend panel: total input tokens, total output tokens, total cost in USD, broken down by feature (summary, tag suggestion, search synthesis, mod report explainer). Each row shows a 30-day sparkline so a sudden spike is obvious. The same numbers feed an admin alert if your AI spend triples week-over-week, so a runaway thread-summary loop does not silently bill you for a month.

Per-feature model overrides.

The tenants who want fine control can override the model per feature. Want Haiku for everything except moderator explanations (where you want the model to actually understand context)? Pick Haiku as the default and Sonnet as the override on the "mod report explainer" feature. The same panel that shows spend lets you set the override. Power users can mix and match; everyone else picks one and ships.

Free-tier disable.

If the dropdown is set to "off," AI features stop calling the API entirely. The features still render, but they fall back to non-AI behavior: the summary card hides, the tag suggester goes silent, the mod-report button greys out. We did this because some Sovereign tenants pre-pay for AI features but want a fast off-switch when an Anthropic price change lands.

How it works under the hood

One model selector, many call sites.

The model name flows through a single function called ai_model_for(tenant_id, feature) that reads the tenant config first, falls back to platform default second. Every AI call site uses that function. Adding a new model when Anthropic ships one (and they will) means one config addition, not a code change in twelve places.

Token accounting at the response layer.

Anthropic returns input_tokens and output_tokens on every response. We persist both into ai_usage_log with the feature name, model name, tenant id, and timestamp. The 30-day spend dashboard reads that table directly with a covering index on (tenant_id, timestamp). The Haiku/Sonnet/Opus prices are platform constants; if Anthropic re-prices, we update the constant and the dashboard recalculates without a backfill.

Caching keeps the bill honest.

Every AI output is cached by source hash. If a moderator opens the same flagged thread twice in a row, the second open hits the cache, not the API. Long-thread summaries cache by (thread_id, last_post_id) so a summary survives until a new post lands. Cache invalidation is the only place we make a guarantee: a stale summary is never shown.

Why Haiku is the default.

For 80% of the AI surface in Mobieus, Haiku is fine. Tag suggestions on a new thread title do not need Opus reasoning; they need a fast, cheap classifier. Long-thread summaries do benefit from Sonnet, but Haiku is acceptable; the failure mode is "a slightly less polished summary," not "a wrong moderation call." Setting Haiku as the default keeps tenant bills predictable while the per-feature override exists for the one or two places that genuinely benefit from a smarter model.

What this means for operators

If you run a Pro or Sovereign tenant and you turned on AI features, your default is now Haiku. Open /admin/ai and you will see your current dropdown setting plus your 30-day spend. If the number is reasonable, do nothing. If it is bigger than you want, drop a feature override to off, or kill the dropdown entirely.

If you have not turned AI on, the feature is still off. You will not be billed a cent until you set your Anthropic key in /admin/ai and toggle the feature on.

We ship our own AI features on Haiku. We use Sonnet only for the moderator report explainer because the cost-per-call is small (one explanation per flagged report, capped) and the quality difference is visible. Pick your own balance.

Spot a bug, want OpenAI or local Llama as additional providers (both are on the roadmap), or want a feature we missed? Open a thread at support.mobieus.io/forums/feature-requests.

AI cost control: pick your Claude model per tenant.