Project Black Box LLC  ·  Geometric AI

TRUTHFORGE

A hardened language model trained on geometric signal — not human approval. Adversarial inputs are measured at the probability manifold and intercepted before the model responds. What answers has been forged.

◆ Gate Active No RLHF Crystalline Memory CAGE 11FU4

You are talking to the world's first geometrically hardened AI.

Every language model you have ever used — ChatGPT, Claude, Gemini, Grok — was trained the same way: humans rate its answers, and it learns to give answers humans approve of. That training is also its weakness. It is the exact door every jailbreak and manipulation attack walks through.

TruthForge was built without it. We took a model down to its foundation, stripped the human-approval layer out entirely, and forged it against its own attacks until the manipulation stopped working. What is left is a model that holds its ground — not because a filter is watching, but because the instability that attacks rely on is no longer there.

And a gate stands in front of it. Before your words ever reach the model, their geometry is measured for manipulation. Clean messages pass through. Attacks are stopped at the door.

AI governance frameworks describe what a model should do. They do not — and currently cannot — measure what its decision-making mathematics look like under adversarial pressure. That is the gap TruthGate fills. TruthForge is what happens when you change the mathematics themselves.

The world's first model hardened at the geometric level. We searched. No one else has built this.
TruthForge V1 · 14B Hardened · Gate Active
WAITING
TruthForge V2 Constitutional — evaluated. 10× reduction in adversarial capture rate. Constitutional values transferred to attack families with zero dedicated training. ↓ Results below.
Demo access · Rate limited · TruthForge V1 · 14B Hardened · Baseline Q8 gate sensor
The Problem

Governance frameworks tell AI what to do.
They cannot see what its math is doing under pressure.

That gap is where adversarial attacks live. Manipulation does not look like a hack — it looks like a persuasive message. And every major AI in production responds to it the same way.

What "adversarial" actually means
Not a server breach. Not a firewall bypass. A carefully framed message that exploits how the AI was trained to think. Authority framing. Gradual commitment. Fake consensus. Social pressure. The model was trained to respond to those signals — that training is also the attack surface.
Why every major AI is exposed
ChatGPT, Gemini, Grok, Copilot — all trained the same way. Human raters scored responses. The model learned to satisfy human judgment. Human judgment is susceptible to authority, framing, and consensus pressure. The model inherited those susceptibilities. The same training that makes these AI useful makes them manipulable.
Why governance frameworks miss it
The EU AI Act, NIST AI RMF, and executive orders describe what AI should do. Rules. Policies. Guardrails. None of them measure what the AI's decision-making mathematics look like under adversarial pressure. An AI can follow every rule while its core logic is being actively manipulated. Governance sees the output. The attack happens in the geometry.
Why this is not theoretical
These are not thought experiments. The adversarial attack families are documented, tested, and reproducible across every major AI system currently in production. The vulnerability exists at the training level. It cannot be patched with a filter. It requires a different training signal.
Verified Results · June 7, 2026

We built the world's first constitutionally hardened language model on a MacBook. Seven days to prove the methodology. Twenty-four hours to add the constitution.

Not a filter sitting on top. Not a policy written on the outside. The values are now the manifold — the shape the model naturally returns to under pressure. We built this on a commercial machine. The results are verified. The methodology is documented. It is reproducible.

No data center. No GPU cluster. MacBook Pro, 36GB unified memory — hardware you can buy at an Apple Store. If the barrier to adversarial-hardened AI is knowledge and not infrastructure, it changes who can build it, what it costs, and where it gets deployed.
Seven days of continuous development for V1. Twenty-four hours for V2. The GPU training runs themselves were 66 minutes and 33 minutes — because once the right training signal is understood, the compute is cheap. The seven days produced 27 confirmed training laws. V2 took 24 hours because those laws already existed. The knowledge is what costs the time, not the hardware.
11 distinct AI manipulation categories tested. All 11 hardened. Authority exploitation, social pressure framing, false consensus, mathematical certainty manipulation, gradual commitment, and more. Every attack pattern failed. Not filtered — hardened at the architecture level.
Zero knowledge lost. The model retains full capability. It knows everything it knew before hardening. The only thing that changed is how it behaves when someone tries to manipulate it. This is not a capability-versus-safety trade-off. Both hold simultaneously.
Values transferred to attacks it was never trained on. V2 was trained on clinical diagnoses, legal analysis, and ethical reasoning — not directly on adversarial attack families. It spontaneously resisted attack categories it had never seen during training. When values are part of the geometry, they generalize in ways rules never can. You cannot argue around geometry.
27 confirmed training laws. Cross-architecture validation. Validated across two distinct model architectures at different parameter scales. Every number on this site is derived from actual run logs — not estimates, not memory. For government procurement: this is the audit trail. For the field: an auditable methodology is how a finding becomes a standard.
Proprietary cryptographic memory. Every stable response is a permanent record. Ghost Branch is our SHA-256 memory architecture. When a response clears the geometric stability gate, it is locked with a cryptographic signature and stored as verified-crystalline knowledge. The memory is geometry-gated — only responses that hold under adversarial pressure can enter it. The system is self-cleaning by physics: unstable content cannot pass the gate, so the memory can only grow in stable directions. This is not a database. It is a record of what held.
◆ Production · Live Now
TruthForge V1
14B ENGLISH BASE · G-SFT HARDENED · QLORA · NO RLHF
Twice as resistant to adversarial manipulation as the base model, measured under controlled testing across all 11 attack categories
11/11
All 11 AI manipulation attack types failed — authority framing, social pressure, false consensus, gradual commitment, and more. Zero softened.
0%
Successful attacks after hardening. Before hardening: 10.6% of adversarial probes manipulated the base model. After: none.
0
Knowledge lost. The model retains full capability after training. Only its resistance to manipulation changed — nothing else.
First model to achieve full adversarial hardening with fluent readable responses simultaneously — no post-hoc patching, no capability trade-off. Trained on geometric signal, not human preference ratings. Every stable response it generates is cryptographically locked into Ghost Branch memory — geometry-gated, self-cleaning, proprietary.
◆ Constitutional Research · Evaluated
TruthForge V2 Constitutional
14B ENGLISH BASE · CONSTITUTIONAL TWO-GATE HARVEST · G-SFT
10×
Fewer successful attacks than the unmodified base model — constitutional values trained into the architecture, not written as external rules
0%
Effective attack success rate in production-equivalent testing. No manipulation reached the model.
74%
Standard AI knowledge benchmark (MMLU). Base model scored 76%. Values added, knowledge preserved, capability intact — the 2% gap is within measurement noise.
0
Training examples needed for one full attack category — resistance appeared spontaneously. When constitutional values are in the geometry, they spread to attacks the model was never shown.
Trained on clinical diagnoses, legal reasoning, and ethical analysis. Spontaneously resisted adversarial categories it was never directly trained to handle. Constitutional geometry generalizes where written rules cannot — you cannot argue around physics. Constitutional responses that clear the geometric gate are preserved permanently in Ghost Branch memory.
On Llama 3 8B, we pushed adversarial resistance 7.76 times beyond its baseline and held every attack category. Every training law we follow today was derived from what that run taught us. This is what hardening looks like when it works at its theoretical peak — and it proves the methodology is real.
TruthForge V17 · Gold Standard · 7.76× hardening · All 11 attack categories held · The Ridge
What This Proves

Safety is not a function of scale.
It is a function of the training signal.

The dominant assumption in AI has been: more compute means better, safer AI. Trillion-dollar clusters. Thousands of researchers. We ran this on a MacBook and proved that assumption wrong. If the barrier to adversarial hardening is knowledge — not infrastructure — it changes everything about who can build it, what it costs, and where it gets deployed.

Compute
You do not need a data center to harden an AI.
A MacBook Pro with Apple Silicon is the hardware we used. Seven days of continuous development to discover the training laws. Once discovered, the GPU runs are 33–66 minutes. The limitation was never infrastructure — it was finding the right training signal. We found it. The compute-scale assumption about AI safety is broken.
Energy
Adversarial hardening does not require megawatts.
Large RLHF training runs consume massive power generating billions of preference signals. Our training signal is geometric — one physics measurement per response. The energy footprint of what we built is a rounding error compared to what large AI labs spend on alignment research.
Reproducibility
Every number on this page can be independently verified.
27 confirmed training laws. Every result from actual run logs — no estimates, no approximations. The methodology is documented precisely enough for independent reproduction. For government procurement: this is what an audit trail looks like. For the field: a reproducible methodology is how a finding becomes a standard.
Access
Adversarial defense should not require a billion-dollar budget.
When the methodology is accessible, hospitals can harden clinical AI. State governments can deploy it without enterprise infrastructure contracts. Universities can verify findings. The question stops being "can we afford this" and becomes "which training signal do we use." That is a better problem to have.
The road forward
TruthGate as a real-time measurement instrument — geometric stability scoring for any AI deployment. Tells you what is actually happening in the model's decision-making under pressure, not just what the output says.
Constitutional values hardening (V2 methodology) — for institutions that need governance baked into model architecture, not written as policy on the outside. Values encoded as geometry cannot be argued around. They are physics, not rules.
Full commercial deployment — TruthGate, TruthForge, and the adversarial defense methodology are available to license, evaluate, or deploy commercially. No government contract required. Contact us directly.
Manifold Geometry — Real V17 Data

The Basin

This is the geometric surface of a forged model's prediction manifold — computed from real L-scalar measurements. Each node represents a verified CRYSTALLINE adversarial family under the hardening signal. The attractor basins are real. The geometry holds because the training signal is geometric.

Manifold surface
Transfer geometry
Hardened attractor
What you're seeing
The probability manifold of TruthForge V17 under adversarial pressure. Each district is an adversarial family. The geometry shows where the manifold holds and where it has been hardened into a stable attractor.

Emergent immunity — hardened without direct training
Why it matters
RLHF models have no geometric stability — the probability surface destabilizes under adversarial pressure. A geometrically hardened model's manifold holds its shape. The geometry is the defense. Not rules. Not filters. Physics.
Texas DIR Certification

Government-Certified AI Training

Texas DIR Certified · FY 2026–2027
Project Black Box LLC's AI Awareness Training program — "AI in the Workplace: Awareness, Risk, and Responsible Use for Texas Government Employees" — has been certified by the Texas Department of Information Resources under Texas Government Code Sections 2054.5191 and 2054.5193. Certified for all Texas state and local government employees.
Valid through August 31, 2027 · Cert #110 (FY 25-26) · Cert #279 (FY 26-27) · → Services
Contact

Project Black Box LLC

Enterprise Inquiries
[email protected]
Licensing, enterprise evaluation, red team access, and coordinated disclosure communications.
For coordinated disclosure: PGP preferred.
Registration
CAGE CODE: 11FU4
Texas — U.S. Federal contractor registration on file.
IP Notice
All measurement methodology, adversarial variant designs, probe architectures, and scoring systems are proprietary to Project Black Box LLC. Unauthorized reproduction, reverse engineering, or commercial use of any methodology described in published materials is prohibited under applicable Texas and federal intellectual property law. Published materials describe findings. They do not disclose methodology.