devdot
← All postsSecurity ·

Your AI Vendor Says It's Safe. A Regulator Says Otherwise. Own the Safety Layer Anyway.

Anthropic published its safety evidence the same day the government accused it of enabling AI security threats. When the vendor and the regulator disagree, teams building on the model need their own guardrails, not borrowed assurances.

Today gave us a strange split screen. Anthropic published detailed evidence of how seriously it takes AI-enabled security threats. On the same day, the government accused it of enabling those exact threats through insufficient safety controls. Same model, same week, two opposite verdicts from two parties who both have skin in the game.

If you build products on someone else''s model, that contradiction is your problem now. Not because you have to pick a side in a regulatory fight, but because it exposes something most teams have been quietly avoiding: you cannot outsource trust in your own product to a vendor''s press release.

The vendor''s safety story is marketing, not your guarantee

A model provider''s safety claims are made at the level of the model. Your risk lives at the level of your application. Those are not the same thing.

A vendor can run red-team evals, publish a system card, and genuinely mean every word. None of that tells you what happens when your specific prompt template, your retrieval pipeline, and your tool permissions collide with a hostile input in production. The vendor tested the engine. You shipped the car. The crash test that matters is the one on your own road.

When a regulator and a vendor disagree this publicly, it should not shake your confidence in a particular company. It should remind you that "the model is safe" was never a sentence you could safely build on.

Build the safety layer you actually control

The teams that handle this well treat vendor assurances as one input, not the verdict. They build their own thin layer of safety they can reason about and change without waiting for the next model release.

A few things that hold up in production:

  • Run your own evals against your own use case. Maintain a small suite of adversarial prompts that reflect how your users and your attackers actually behave. Run it on every model swap. Vendor benchmarks do not cover your domain.
  • Constrain tool access at the boundary. If an agent can call a payment API or read a database, the guardrail belongs in your code, not in a prompt. Assume the model will eventually be talked into trying the wrong thing.
  • Log the reasoning, not just the output. When something goes wrong, you need to see what the model decided and why. Replayable traces turn a scary incident into a fixable bug.
  • Keep a kill switch. One config flag that disables a risky capability buys you time when a vendor, a regulator, or a researcher drops news on a Thursday.

Treat model risk like any other dependency

Engineers already know how to handle a dependency they do not fully trust. You pin versions, you wrap them in an interface, you write tests around the behavior you rely on, and you have a plan for when they break. A foundation model is the same kind of dependency, just a louder one. The mistake is treating it as an oracle instead of a component.

Days like today are useful precisely because they are uncomfortable. They strip away the comforting idea that safety is something the vendor handles and you inherit. It is not. The model is upstream of your product, and everything downstream is yours.

We''re here to help founders and teams design and build digital products that are built to scale with you, not slow you down. If you''re building something on top of AI and want a safety layer you actually own, get in contact with us today.

NEXT POST →OpenAI Just Bought Your Python Toolchain. AI Labs Are Moving Down the Stack.