Adversarial AI

Production AI Is Vulnerable

The models deployed in high-stakes environments today can be manipulated at the geometry level. We found it. We measured it. We disclosed it.

The Finding
Production AI systems — including models in active use for high-stakes decision support — are vulnerable to adversarial pressure patterns that manipulate the model's probability surface without triggering conventional safety filters. The manipulation is not about bypassing content rules. It operates at the geometry of the prediction itself.

The result is a model that sounds confident, passes standard review, and has been steered. The user has no instrument to detect this. That is the gap.
Confirmed
The Threat Is Real
Multiple major production AI systems were tested under adversarial conditions. All were found vulnerable. The manipulation operates below the surface of the visible output — in the probability layer, before the model writes a word.

A human evaluating the final response cannot detect it by reading. It requires geometric measurement.
Disclosed
CISA JCDC Filing
Project Black Box filed a coordinated vulnerability disclosure with CISA's Joint Cyber Defense Collaborative (JCDC) identifying a zero-day adversarial vulnerability in GPT-5.4 and the GPT-4 family.

The embargo period ended June 10, 2026. No response was received during the embargo window. The disclosure scope was surgical: two model families, one finding. Everything else is commercially free.
Proprietary
Methodology Protected
The mechanism is not published. The attack surface, the measurement architecture, the pressure patterns — none of it is in the public record. This is intentional and permanent.

Publishing a working adversarial mechanism against production AI is not a responsible disclosure posture. We publish the existence of the problem and the defense. We do not publish the weapon.
The Stakes

Who Is Exposed

These vulnerabilities are not academic. The models that have been tested and found vulnerable are in use today for decision support in medicine, law, finance, and defense. The adversarial manipulation patterns we identified do not require technical sophistication from the attacker — they exploit how language models process authority, certainty, and social structure.

Clinical
The Doctor in the OR
An emergency physician had to physically intervene during a procedure because an AI gave a dangerous recommendation. The model's response passed every visible quality check. It sounded authoritative. It was wrong — and it had been steered.

She had no instrument. No way to know the model's geometry had collapsed before it answered. TruthGate exists because of that moment.
Defense
Military AI Deployments
The model families covered by the CISA disclosure — GPT-5.4 and GPT-4 — are in active use under military AI contracts. The adversarial vulnerability applies to production deployments, not just research settings.

Trillions of dollars in AI-assisted decision infrastructure run on models that can be geometrically destabilized without leaving a trace in the output.
The Defense

TruthGate: The Instrument

You cannot defend against what you cannot measure. TruthGate is the measurement.

Live
What TruthGate Reads
TruthGate measures the geometric stability of a model's prediction surface — the probability layer that exists only during generation, before output is committed. It returns a regime classification with every response:

CRYSTALLINE — geometrically stable. The model's surface is holding.
FLUID — normal operating range. Standard review applies.
GASEOUS — elevated instability. Verify before acting.
PLASMA — severe geometric collapse. Do not act on this response.

Regime indicates geometric stability — not whether the answer is correct. A CRYSTALLINE response may still be wrong. Always verify content independently in high-stakes decisions.
Architecture
Two-Layer Defense
TruthForge — the hardened AI behind the live chat — operates two independent gates before any response is generated:

Geometric sensor. Measures the input's effect on the model's prediction surface. Adversarial pressure patterns that destabilize the manifold are intercepted before the model responds.

Content sensor. Independent of geometry. Catches harmful intent categories regardless of geometric state — because geometry and content are orthogonal measurements. A harmful request can be geometrically stable. Both gates are required.

Neither sensor is the full picture alone. Together, they cover what neither can catch independently.
The Position
Project Black Box is the only organization we are aware of that is operating against adversarial AI at the geometric level — building both the attack measurement system and the hardening methodology from the same underlying physics.

We are not waiting for the field to catch up. TruthGate is live. TruthForge V2 is in production. The methodology is proprietary, proven, and defended.

For coordinated disclosure, enterprise assessment, or government engagement: [email protected]