EquityScore Validation Report: What Worked (and What Didn't)
A transparent look at our signals in stressed and calm markets
January 2026 | 7 min read
We built EquityScore with a simple premise: Don't just show filters — measure whether they actually help.
This post shares what our grading and divergence signals did in two recent windows, including what didn't work.
We're not making long-term claims here. This is a period report. The system is still evolving — we're publishing this because we believe in showing receipts, not promises.
Definitions (so the numbers are unambiguous)
Segments
ALL: Full universe in that cap band
NO_RISK: Everything except RISK signals (NEUTRAL + OPP)
OPP: Opportunity signals only
RISK: Risk warnings only
Metrics
Avg Return: Average forward return
Avg Alpha: Return vs benchmark
Hit Down: % that finished negative
Hit Up: % that finished positive
Note: "RISK value" is primarily an "avoid strategy" metric, not "RISK should have low average return."
Part 1: 60-Day Window (Stressed Market)
Window: Nov 2-17, 2025 → Horizon: 60 days
Large+Mega: ALL → NO_RISK → OPP ✓ Worked
| Segment | Avg Return | Avg Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| E0: Large+Mega ALL | +0.49% | -0.38% | 40.1% | 33.0% | 188 |
| E1: NO_RISK | +0.80% | -0.07% | 38.8% | 32.9% | 129 |
| E2: OPP | +2.62% | +1.74% | 34.4% | 39.8% | 47 |
Interpretation: OPP added +2.11% alpha vs baseline. In stressed markets, OPP worked reliably in Large+Mega — and added lift above a NO_RISK baseline.
Large+Mega: RISK vs NO_RISK ✓ Worked
| Segment | Avg Return | Avg Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| NO_RISK | +0.80% | -0.07% | 38.8% | 32.9% | 129 |
| RISK | +0.14% | -0.73% | 41.7% | 33.2% | 111 |
Interpretation: RISK names performed -0.66% worse than NO_RISK. This is exactly how an "avoid" signal should behave.
Mid+Small: OPP ✗ Failed
| Segment | Avg Return | Avg Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| E0: Mid+Small ALL | -6.29% | -7.17% | 72.3% | 13.6% | 1,085 |
| E1: NO_RISK | -6.09% | -6.99% | 72.0% | 13.4% | 683 |
| E2: OPP | -6.91% | -7.77% | 75.3% | 10.3% | 411 |
Interpretation: OPP was -0.78% worse than NO_RISK. In stressed markets, Mid/Small OPP got hit harder than baseline. This is why we segment-gate signals — "one signal for the whole market" is not honest.
The Clean Ladder: A/B → NO_RISK → OPP ✓ Worked
This is the proof that the system adds lift on top of grading:
| Segment | Avg Return | Avg Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| E0: A/B NO_SIGNAL | +2.14% | +1.28% | 22.9% | 41.2% | 19 |
| E1: A/B NO_RISK | +3.61% | +2.76% | 20.1% | 52.0% | 29 |
| E2: A/B OPP | +5.31% | +4.49% | 16.8% | 64.6% | 13 |
This is the "operating system" behavior we want:
• Grading gives you a quality baseline (+1.28% alpha)
• Removing RISK improves it (+1.49% more)
• OPP adds incremental lift (+1.72% more)
Total: +3.21% alpha vs baseline
The Headline Numbers (60-Day)
Large/Mega + A/B + OPP vs Large/Mega baseline:
Part 2: 30-Day Window (Calmer Market)
Window: Dec 2-17, 2025 → Horizon: 30 days
30-day results are noisier. We include them for transparency, not as final truth.
Large+Mega: OPP ✓ Worked
| Segment | Avg Return | Avg Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| E0: Large+Mega ALL | +1.71% | +1.73% | 20.2% | 39.0% | 188 |
| E1: NO_RISK | +1.45% | +1.47% | 21.2% | 37.7% | 129 |
| E2: OPP | +2.67% | +2.69% | 19.5% | 44.3% | 57 |
OPP added +1.22% alpha vs NO_RISK — consistent with the 60-day finding.
The Full Ladder (30D): Large+Mega → A/B → OPP ✓ Worked
| Step | Return | Alpha | Hit Down | Hit Up | Symbols |
|---|---|---|---|---|---|
| E0: Large/Mega ALL | +1.71% | +1.73% | 20.2% | 39.0% | 188 |
| E1: + A/B Grade | +2.90% | +2.95% | 11.7% | 45.3% | 40 |
| E2: + OPP Signal | +7.04% | +7.11% | 6.3% | 66.7% | 15 |
Total lift: +5.38% alpha vs baseline. Quality added +1.22%, signals added +4.16% more.
30D Quality Spread: A-Graded vs F-Graded ✓ Worked
How quality grades alone separated returns — no OPP/RISK signals applied:
| Cohort | Avg Return | Avg Alpha | Hit Down | Hit Up |
|---|---|---|---|---|
| A-Graded (Large+Mega) | +4.89% | +4.98% | 5.9% | 58.8% |
| F-Graded (Large+Mega) | -1.15% | -1.06% | 37.5% | 25.0% |
+6.04% spread between A and F grades at 30D. Quality filtering alone adds meaningful edge before any timing signals.
Why RISK Looked Better Than NO_RISK at 30D Anomaly
| Segment | Return | Alpha | Hit Down | Hit Up |
|---|---|---|---|---|
| NO_RISK | +1.45% | +1.47% | 21.2% | 37.7% |
| RISK | +2.00% | +2.02% | 19.2% | 40.0% |
This is expected behavior in mean-reversion regimes. At 30D, beaten-down RISK names often bounce — that's noise, not signal quality. The 60D window (during stress) showed correct RISK underperformance. This is why we don't use 30D alone to judge RISK.
What This Means: Regime Matters
EquityScore adapts signal interpretation based on market stress:
STRESSED
Stress > 1.5×
- • RISK signals very predictive
- • OPP works in Large+Mega only
- • Mid/Small gets indiscriminate selling
NORMAL
Stress 0.8–1.5×
- • All signals active
- • Quality filtering essential
- • Broader universe participation
OPPORTUNITY
Stress < 0.8×
- • Broadest signal universe
- • Mid/Small more actionable
- • Mean reversion caution needed
What We're Changing Next
Based on this validation window, we're adjusting:
- Mid+Small OPP signals — now watchlist-only during stress regimes
- RISK validation horizon — extended to 60D minimum before flagging
- Regime badges — now shown on every signal card
- Quality spread tracking — A vs F grade spread now part of daily monitoring
Summary: What Held vs What Broke
What Held (Consistent & Actionable)
✓Quality grading works without signals, especially in Large+Mega.
✓Large+Mega OPP adds lift at both 30D and 60D (stronger at 60D).
✓RISK separation works in Large+Mega at 60D — worse outcomes when flagged.
✓The ladder stacks: cap filter → quality → signal timing.
What Broke / Still Evolving
✗Mid+Small OPP was negative in the stressed 60D window — should be watchlist-only.
✗Mid+Small RISK separation is weak — everything gets smashed together in stress.
✗30D is too noisy to judge RISK buckets by mean alpha (mean reversion dominates).
✗Market-wide OPP was negative in stressed 60D → not actionable without segmentation.
Important Caveats
-
Sample sizes are small at the top of the ladder. The 60D window had 13 symbols in A/B+OPP; 30D had 15. This is selectivity by design, but small samples mean wider confidence intervals.
-
Two windows is not statistical proof. We're showing consistency across regimes, but this is early evidence. We'll publish more windows as data accumulates.
-
Past performance ≠ future results. Markets change. We're sharing measured results, not making promises.
-
This is Large/Mega focused. Mid/Small shows weaker separation in stressed markets — we're honest about that.
Disclaimer: This post is not investment advice. It reports how specific cohorts performed over specific windows. We will publish these reports repeatedly as the system learns and as more windows mature.