On June 26, OpenAI previewed GPT-5.6 as three separate models instead of one. Sol is the flagship for hard problems. Terra is the balanced everyday model. Luna is the fast, cheap one built for volume. Same family, three very different price and latency profiles.
Most teams will wire up Sol, point everything at it, and move on. That is the expensive mistake hiding in plain sight.
One model became a menu
For years the upgrade path was simple. A new model dropped, you swapped the string, you got a better answer for roughly the same shape of cost. Tiering breaks that habit. Now the provider is handing you a decision on every request: how much intelligence does this specific call actually need?
That decision used to be made for you. It isn't anymore. And the gap between tiers is not small. The flagship can cost many times what the volume model costs per token, and it is slower. When you route a classification step, a formatting pass, or a "did the user say yes or no" check through the flagship, you are paying premium reasoning rates for work a cheaper tier nails every time.
The flagship being the default feels safe. It is the opposite. It quietly inflates your unit economics on the calls that run thousands of times a day.
Treat the flagship as opt-in
The fix is a mindset flip. Stop asking which model you should standardise on. Start asking which tier each request path deserves.
A practical way to think about it:
- Luna by default for high-volume, low-ambiguity work: extraction, routing, tagging, short replies, structured output you already validate.
- Terra for the everyday middle: drafting, summarising, normal chat, most user-facing flows.
- Sol only where you can point at the reasoning that justifies it: thorny multi-step problems, ambiguous inputs, the calls where a wrong answer is genuinely costly.
Then build a fallback ladder. Start a request on a cheaper tier, and escalate only when a confidence check, a validation failure, or a retry says you need to. Most calls never climb the ladder. The ones that do earn the cost.
Make the tier a config value, not a constant
If your model tier is hardcoded in twelve places, you cannot tune it. Pull it into config, keyed by request path or task type, so you can move a workflow from Sol to Terra without a deploy. Log which tier served each call alongside its cost and its outcome. Within a week you will see exactly where you are overpaying and where a cheaper tier quietly degraded quality.
This is the same discipline good teams already apply to database reads and cache layers. You do not run every query against the primary. You do not skip the cache because memory is easy. Model tiers are now another resource you provision deliberately, per workload, with measurement behind the choice.
The teams that win the next year of AI products will not be the ones on the smartest model. They will be the ones who spend their token budget where it changes the answer and pocket the difference everywhere else.
We're here to help founders and teams design and build digital products that are built to scale with you, not slow you down. If you're looking to build something, get in contact with us today!