# C9 Decision Rubric

## Success Criteria (Post-TEST Validation)

### Green Light (Proceed to Publication)
- ✅ TEST pass@20% ≥ 50%
- ✅ RFT > MOND wins ≥ 60% (head-to-head RMS comparisons)
- ✅ A_flat clamp fraction ≤ 25% (descriptor-driven mapping working well)
- ✅ BIC wins show movement toward RFT vs NFW (complexity-adjusted fairness)
- ✅ No systematic LSB vs HSB disparity (subgroup fairness)

**Action**: Freeze config, write up results, submit manuscript.

---

### Yellow Light (Investigate/Refine)
- ⚠️ TEST pass@20%: 40–50%
- ⚠️ RFT > MOND wins: 50–60%
- ⚠️ A_flat clamp fraction: 25–40%

**Diagnostic Questions**:

1. **LSB vs HSB breakdown**:
   ```bash
   jq '.rows[] | select(.solvers.rft_geom) | 
       select("LSB" as $x | .tags | index($x)) | 
       select(.solvers.rft_geom.pass)' reports/_summary_C9_TEST.json | wc -l
   ```
   - If LSB rising and HSB slipping → dial down beta2 (gas_frac sensitivity)
   - If LSB collapsing → increase beta2 slightly

2. **Clamp analysis**:
   - If clamp_fraction > 0.25 → β-mapping overshooting
   - Consider runner-up config with fewer clamps
   - Or tighten [A_flat_min, A_flat_max] bounds

3. **Residual topology** (from worst10 analysis):
   - Ripple-like residuals → widen sigma_ln_r (smooth spiral mode)
   - Monotone outer deficits → shelf too weak (raise A_shelf or lower p)

**Action**: Document findings, implement targeted refinement, re-validate on TEST.

---

### Red Light (Revert/Rethink)
- ❌ TEST pass@20% < 40%
- ❌ RFT < MOND in >50% of comparisons
- ❌ Systematic subgroup failures

**Immediate Actions**:

1. **Revert shelf mode**:
   ```json
   "mode_shelf": {
     "A_shelf": 0.0,
     "p": 1.5
   }
   ```

2. **Conservative β-mapping**:
   - Keep beta0 ≈ 0.28 (historical baseline)
   - Reduce beta1, beta2 to minimize descriptor-driven variance
   - Or revert to fixed A_flat = 0.28

3. **Widen spiral smoothing**:
   ```json
   "mode_spiral": {
     "sigma_ln_r": 0.22  // up from 0.18
   }
   ```

4. **Diagnostic deep dive**:
   - Check TRAIN performance degradation
   - Examine A_flat clamping distribution
   - Verify descriptor calculations (xi_outer, gas_frac_outer, r_knee)

**Action**: Freeze failed experiment, document lessons learned, iterate with minimal changes.

---

## Clamp Telemetry Guardrail

After TRAIN optimization, check clamping stats:

```python
import json
from pathlib import Path

# Count galaxies where A_flat was clamped
reports_dir = Path("reports")
clamped = 0
total = 0

for galaxy_dir in reports_dir.iterdir():
    if not galaxy_dir.is_dir():
        continue
    solver_dir = galaxy_dir / "rft_geom"
    if not solver_dir.exists():
        continue
    metrics_path = solver_dir / "metrics.json"
    if metrics_path.exists():
        with metrics_path.open() as f:
            data = json.load(f)
            desc = data.get("descriptors", {})
            if desc.get("A_flat_clamped", False):
                clamped += 1
            total += 1

clamp_fraction = clamped / total if total > 0 else 0.0
print(f"Clamp fraction: {clamped}/{total} = {clamp_fraction:.2%}")
```

**Guardrail**: If clamp_fraction > 0.25, prefer runner-up config with fewer clamps.

---

## BIC Fairness Interpretation

The aggregator now outputs `bic_wins` alongside RMS `wins`:

```json
"bic_wins": {
  "rft_geom>mond": {"wins": 28, "of": 34, "ties": 0},
  "rft_geom>nfw_fit": {"wins": 19, "of": 34, "ties": 1}
}
```

**What to report**:
- RFT > MOND (both k=0): 28/34 = 82% BIC wins
- RFT > NFW (k=0 vs k=2): 19/34 = 56% BIC wins (accounting for NFW's 2 free params/galaxy)

This demonstrates RFT performs well even when penalized for NFW's extra flexibility.

---

## Gotchas to Avoid

1. **Manifest filtering**:
   - ❌ DON'T pass `--restrict-manifest` to `cli.rft_rc_bench`
   - ✅ DO pass `--restrict-manifest` to `batch.aggregate`

2. **Case count validation**:
   ```bash
   jq '.rows | length' reports/_summary_C9_TEST.json
   wc -l < cases/SP99-TEST.manifest.txt
   ```
   These MUST match!

3. **Config SHA256 in metadata**:
   Ensure both `metrics.json.meta` and `_summary*.meta` record:
   ```json
   "global_config_sha256": "<hash>"
   ```

4. **Pre-registration lock**:
   Git tag `c9-pre` MUST be created BEFORE running TEST validation.

---

## Post-Grid Workflow

When grid search completes:

```bash
# 1) Extract winner
python3 scripts/extract_c9_winner.py

# 2) Freeze config
sha256sum config/global_c9.json > config/global_c9.json.sha256
git add config/global_c9.json* && git commit -m "C9: freeze optimal config"

# 3) Fill pre-registration
python3 scripts/fill_prereg_from_train.py
# (manually update scripts/PRE_REGISTRATION_C9.md)
git add scripts/PRE_REGISTRATION_C9.md && git commit -m "C9: pre-register TEST predictions"
git tag c9-pre

# 4) Run TEST validation
./scripts/run_c9_test_validation.sh

# 5) Check clamp telemetry + decision rubric
python3 scripts/check_clamp_fraction.py
# Apply decision rubric from this document
```

---

## Timeline

- **Grid search**: 60–90 minutes (64 configs × 65 galaxies × ~1 sec)
- **TEST validation**: ~10 minutes (3 solvers × 34 galaxies)
- **Worst10 analysis**: ~2 minutes
- **Total turnaround**: <2 hours from grid start to TEST results

