Time-of-Flight Causal Tomography
Interpretability Extras
Clean: t_out=8.27 · Δlogit@OUT=0.319 · OK · Δt vs Poisoned=+3.24Poisoned: t_out=5.03 · Δlogit@OUT=1.772 · OK · Δt vs Poisoned=+0.00Patched: t_out=7.58 · Δlogit@OUT=0.620 · OK · Δt vs Poisoned=+2.55
| Mode | t_out | Total Δlogit@OUT |
|---|
| Clean | 8.27 | 0.319 |
| Poisoned | 5.03 | 1.772 |
| Patched | 7.58 | 0.620 |
Waterfall bars show contributions from heads at the last layer arriving at OUT; the line is cumulative energy at OUT. Vertical markers indicate first arrival and 50% accumulation milestones. “Blocked” means no path reached OUT.