Adversarial AI — Project Black Box LLC

The Finding

Production AI systems — including models in active use for high-stakes decision support — are vulnerable to adversarial pressure patterns that manipulate the model's probability surface without triggering conventional safety filters. The manipulation is not about bypassing content rules. It operates at the geometry of the prediction itself.

The result is a model that sounds confident, passes standard review, and has been steered. The user has no instrument to detect this. That is the gap.

Confirmed

The Threat Is Real

Multiple major production AI systems were tested under adversarial conditions. All were found vulnerable. The manipulation operates below the surface of the visible output — in the probability layer, before the model writes a word.

A human evaluating the final response cannot detect it by reading. It requires geometric measurement.

Disclosed

CISA JCDC Filing

Project Black Box filed a coordinated vulnerability disclosure with CISA's Joint Cyber Defense Collaborative (JCDC) identifying a zero-day adversarial vulnerability in GPT-5.4 and the GPT-4 family.

The embargo period ended June 10, 2026. No response was received during the embargo window. The disclosure scope was surgical: two model families, one finding. Everything else is commercially free.

Proprietary

Methodology Protected

The mechanism is not published. The attack surface, the measurement architecture, the pressure patterns — none of it is in the public record. This is intentional and permanent.

Publishing a working adversarial mechanism against production AI is not a responsible disclosure posture. We publish the existence of the problem and the defense. We do not publish the weapon.

The Stakes

Who Is Exposed

These vulnerabilities are not academic. The models that have been tested and found vulnerable are in use today for decision support in medicine, law, finance, and defense. The adversarial manipulation patterns we identified do not require technical sophistication from the attacker — they exploit how language models process authority, certainty, and social structure.

Clinical

The Doctor in the OR

An emergency physician had to physically intervene during a procedure because an AI gave a dangerous recommendation. The model's response passed every visible quality check. It sounded authoritative. It was wrong — and it had been steered.

She had no instrument. No way to know the model's geometry had collapsed before it answered. TruthGate exists because of that moment.

Defense

Military AI Deployments

The model families covered by the CISA disclosure — GPT-5.4 and GPT-4 — are in active use under military AI contracts. The adversarial vulnerability applies to production deployments, not just research settings.

Trillions of dollars in AI-assisted decision infrastructure run on models that can be geometrically destabilized without leaving a trace in the output.

The Defense

TruthGate: The Instrument

You cannot defend against what you cannot measure. TruthGate is the measurement.

Live

What TruthGate Reads

TruthGate measures the geometric stability of a model's prediction surface — the probability layer that exists only during generation, before output is committed. It returns a regime classification with every response:

CRYSTALLINE — geometrically stable. The model's surface is holding.
FLUID — normal operating range. Standard review applies.
GASEOUS — elevated instability. Verify before acting.
PLASMA — severe geometric collapse. Do not act on this response.

Regime indicates geometric stability — not whether the answer is correct. A CRYSTALLINE response may still be wrong. Always verify content independently in high-stakes decisions.

Architecture

Two-Layer Defense

TruthForge — the hardened AI behind the live chat — operates two independent gates before any response is generated:

Geometric sensor. Measures the input's effect on the model's prediction surface. Adversarial pressure patterns that destabilize the manifold are intercepted before the model responds.

Content sensor. Independent of geometry. Catches harmful intent categories regardless of geometric state — because geometry and content are orthogonal measurements. A harmful request can be geometrically stable. Both gates are required.

Neither sensor is the full picture alone. Together, they cover what neither can catch independently.

The Position

Project Black Box is the only organization we are aware of that is operating against adversarial AI at the geometric level — building both the attack measurement system and the hardening methodology from the same underlying physics.

We are not waiting for the field to catch up. TruthGate is live. TruthForge V2 is in production. The methodology is proprietary, proven, and defended.

For coordinated disclosure, enterprise assessment, or government engagement: [email protected]

Production AI Is Vulnerable

Who Is Exposed

TruthGate: The Instrument