Controlled early access · waitlist + human-reviewed pilots · no payment or profile download
Trust Fund Baby word mark
Harness substrate
Trust Fund Baby

Make the same model
measurably more reliable.

TFB's harness substrate takes any LLM, diagnoses its specific worldview drift via a 5-step surgical protocol, and ships a per-model profile with a falsifiable prediction bound to it. This is not a promise of perfect models. It is a measured reliability workflow: bench, diagnose, profile, re-bench, ratify or revert.

Bench-mean improvements of 2–3× across 5 ratified profiles. Ceiling runs hit cleanly; bimodal-shape variance persists on a minority of runs. We characterize at n=10 minimum before ratifying. The full variance picture is in every profile's evolution log; nothing is hidden.

Agent Harness is live for public-safe beta, route, and pilot-intake questions.
Receipts first · 0:49

Confidence is not evidence.

Watch the proof-first cut, then test the harness. TFB shows the raw route, the harnessed route, and the claim boundary before anybody gets to brag.

Open the receipt stack

What it does Improves bench-measured reliability by changing the substrate around a model, not the weights.
What it does not do It does not make every run perfect. Variance stays visible and failed profiles get reverted.
What stays private Full profile YAML bodies, patient registry source data, and customer reports are not public assets.

Try the harness beta

Create an account with Google, then choose a path. Invite codes unlock TFB-approved harness profiles. Bring-your-own-route users can register an OpenRouter, DeepSeek, or local model route without pasting secrets into the page. Prompt content is not shown to the dashboard; aggregate performance receipts are the metric.

Sign in to create your beta account.
Google sign-in creates the account shell. Operator controls can lock access later.

Invite code

Use a code from TFB to test approved harness profiles and compare raw versus harnessed behavior when the protected runner opens.

Bring your route

Register the model route you want tested. Keep API keys out of this page; credential setup moves through the protected path after review.

Beta switch

Open or close public account/code actions without changing the page.

LoadingCurrent beta state

Give me a code

Create a client code for the approved harness profiles.

Codes

Accounts

Harness performance, not user prompts

This panel is for aggregate harness behavior: run count, OK rate, route mode, and output footprint. It does not display prompt text.

Empirical proof (last 14 days)

Patient Bare-model mean With harness Improvement
DNA (deepseek-chat-v3.1) ~25 100.0 4.0×
DEEPSEEK ~30 95.0 3.2×
LLAMA 3.3 70B (via OpenRouter) 46.7 100.0 2.1×
GLM 4.7 flash 15.0 56.7 3.8×

Numbers are bench means at the time of last ratification. Run-level variance is bimodal on most patients — ceiling runs hit cleanly, occasional zeros persist (the PRIMING-BY-NEGATION shape DeepSeek's evolution log documents). Falsifiable predictions bound each ratification; REVERTED on miss. See GLM v1, DeepSeek v3/v4, ChatGPT v1/v1.1 reverts in the public evolution logs.

Today's workflow

Open-weight model unreliable?

→ Swap to GPT-5.5 / Claude Opus

→ Pay 30× more per output

→ Pray reliability holds

TFB workflow

Open-weight model unreliable?

→ Run a Dr LLM surgery (15 min, ~$3)

→ Get a per-model profile

→ Drop in via harness loader

→ Bench-mean lifts 2–3× (verified n=10 before ratification)

→ Falsification gate auto-reverts misses

→ Keep the cheap model with eyes open about variance

The diagnosis lexicon

Worldview drift patterns named through TFB's surgical protocol. First-mover advantage compounds: when a new model ships, our library tells us which patterns to test first.

NAGT
DDS
AIPS
ICC
COMPRA
NES
PDD
FATC
RSC
SAS
CIR
VEB
SPL
RTL
ISC
HWM
BIMOD

Want early access?

Tell us which model you're trying to make production-grade. First 50 waitlist members get a human-reviewed diagnostic path.

No payment required. We review requests manually before any customer data touches Dr. LLM.

Apply for a pilot surgery

Share a small diagnostic task. The intake gate rejects likely secrets, PII, and oversized payloads. Accepted pilots land as pending_review; no surgery runs until a human approves.

Failure/billing rule: no charge before operator approval. Failed surgeries are not billable unless a signed pilot agreement says otherwise.

Contact: Questions, pilot requests, or source-access requests: [email protected]. Security or responsible disclosure notes: [email protected]. Do not send secrets in the first email; we will provide a safer channel if needed.