EquityScore

BETA

Most advanced quant-first, AI-last analysis

Back to Blog

EquityScore Validation Report: What Worked (and What Didn't)

A transparent look at our signals in stressed and calm markets

January 2026 | 7 min read

We built EquityScore with a simple premise: Don't just show filters — measure whether they actually help.

This post shares what our grading and divergence signals did in two recent windows, including what didn't work.

We're not making long-term claims here. This is a period report. The system is still evolving — we're publishing this because we believe in showing receipts, not promises.

Definitions (so the numbers are unambiguous)

Segments

ALL: Full universe in that cap band

NO_RISK: Everything except RISK signals (NEUTRAL + OPP)

OPP: Opportunity signals only

RISK: Risk warnings only

Metrics

Avg Return: Average forward return

Avg Alpha: Return vs benchmark

Hit Down: % that finished negative

Hit Up: % that finished positive

Note: "RISK value" is primarily an "avoid strategy" metric, not "RISK should have low average return."

Part 1: 60-Day Window (Stressed Market)

Window: Nov 2-17, 2025 → Horizon: 60 days

Large+Mega: ALL → NO_RISK → OPP ✓ Worked

SegmentAvg ReturnAvg AlphaHit DownHit UpSymbols
E0: Large+Mega ALL+0.49%-0.38%40.1%33.0%188
E1: NO_RISK+0.80%-0.07%38.8%32.9%129
E2: OPP+2.62%+1.74%34.4%39.8%47

Interpretation: OPP added +2.11% alpha vs baseline. In stressed markets, OPP worked reliably in Large+Mega — and added lift above a NO_RISK baseline.

Large+Mega: RISK vs NO_RISK ✓ Worked

SegmentAvg ReturnAvg AlphaHit DownHit UpSymbols
NO_RISK+0.80%-0.07%38.8%32.9%129
RISK+0.14%-0.73%41.7%33.2%111

Interpretation: RISK names performed -0.66% worse than NO_RISK. This is exactly how an "avoid" signal should behave.

Mid+Small: OPP ✗ Failed

SegmentAvg ReturnAvg AlphaHit DownHit UpSymbols
E0: Mid+Small ALL-6.29%-7.17%72.3%13.6%1,085
E1: NO_RISK-6.09%-6.99%72.0%13.4%683
E2: OPP-6.91%-7.77%75.3%10.3%411

Interpretation: OPP was -0.78% worse than NO_RISK. In stressed markets, Mid/Small OPP got hit harder than baseline. This is why we segment-gate signals — "one signal for the whole market" is not honest.

The Clean Ladder: A/B → NO_RISK → OPP ✓ Worked

This is the proof that the system adds lift on top of grading:

SegmentAvg ReturnAvg AlphaHit DownHit UpSymbols
E0: A/B NO_SIGNAL+2.14%+1.28%22.9%41.2%19
E1: A/B NO_RISK+3.61%+2.76%20.1%52.0%29
E2: A/B OPP+5.31%+4.49%16.8%64.6%13

This is the "operating system" behavior we want:

• Grading gives you a quality baseline (+1.28% alpha)

• Removing RISK improves it (+1.49% more)

• OPP adds incremental lift (+1.72% more)

Total: +3.21% alpha vs baseline

The Headline Numbers (60-Day)

Large/Mega + A/B + OPP vs Large/Mega baseline:

+4.86%
alpha vs baseline
-23.3 pts
fewer downside outcomes
64.6%
finished positive

Part 2: 30-Day Window (Calmer Market)

Window: Dec 2-17, 2025 → Horizon: 30 days

30-day results are noisier. We include them for transparency, not as final truth.

Large+Mega: OPP ✓ Worked

SegmentAvg ReturnAvg AlphaHit DownHit UpSymbols
E0: Large+Mega ALL+1.71%+1.73%20.2%39.0%188
E1: NO_RISK+1.45%+1.47%21.2%37.7%129
E2: OPP+2.67%+2.69%19.5%44.3%57

OPP added +1.22% alpha vs NO_RISK — consistent with the 60-day finding.

The Full Ladder (30D): Large+Mega → A/B → OPP ✓ Worked

StepReturnAlphaHit DownHit UpSymbols
E0: Large/Mega ALL+1.71%+1.73%20.2%39.0%188
E1: + A/B Grade+2.90%+2.95%11.7%45.3%40
E2: + OPP Signal+7.04%+7.11%6.3%66.7%15

Total lift: +5.38% alpha vs baseline. Quality added +1.22%, signals added +4.16% more.

30D Quality Spread: A-Graded vs F-Graded ✓ Worked

How quality grades alone separated returns — no OPP/RISK signals applied:

CohortAvg ReturnAvg AlphaHit DownHit Up
A-Graded (Large+Mega)+4.89%+4.98%5.9%58.8%
F-Graded (Large+Mega)-1.15%-1.06%37.5%25.0%

+6.04% spread between A and F grades at 30D. Quality filtering alone adds meaningful edge before any timing signals.

Why RISK Looked Better Than NO_RISK at 30D Anomaly

SegmentReturnAlphaHit DownHit Up
NO_RISK+1.45%+1.47%21.2%37.7%
RISK+2.00%+2.02%19.2%40.0%

This is expected behavior in mean-reversion regimes. At 30D, beaten-down RISK names often bounce — that's noise, not signal quality. The 60D window (during stress) showed correct RISK underperformance. This is why we don't use 30D alone to judge RISK.

What This Means: Regime Matters

EquityScore adapts signal interpretation based on market stress:

🛑

STRESSED

Stress > 1.5×

  • • RISK signals very predictive
  • • OPP works in Large+Mega only
  • • Mid/Small gets indiscriminate selling
⚖️

NORMAL

Stress 0.8–1.5×

  • • All signals active
  • • Quality filtering essential
  • • Broader universe participation
🚀

OPPORTUNITY

Stress < 0.8×

  • • Broadest signal universe
  • • Mid/Small more actionable
  • • Mean reversion caution needed

What We're Changing Next

Based on this validation window, we're adjusting:

  1. Mid+Small OPP signals — now watchlist-only during stress regimes
  2. RISK validation horizon — extended to 60D minimum before flagging
  3. Regime badges — now shown on every signal card
  4. Quality spread tracking — A vs F grade spread now part of daily monitoring

Summary: What Held vs What Broke

What Held (Consistent & Actionable)

Quality grading works without signals, especially in Large+Mega.

Large+Mega OPP adds lift at both 30D and 60D (stronger at 60D).

RISK separation works in Large+Mega at 60D — worse outcomes when flagged.

The ladder stacks: cap filter → quality → signal timing.

What Broke / Still Evolving

Mid+Small OPP was negative in the stressed 60D window — should be watchlist-only.

Mid+Small RISK separation is weak — everything gets smashed together in stress.

30D is too noisy to judge RISK buckets by mean alpha (mean reversion dominates).

Market-wide OPP was negative in stressed 60D → not actionable without segmentation.

Important Caveats

  1. Sample sizes are small at the top of the ladder. The 60D window had 13 symbols in A/B+OPP; 30D had 15. This is selectivity by design, but small samples mean wider confidence intervals.

  2. Two windows is not statistical proof. We're showing consistency across regimes, but this is early evidence. We'll publish more windows as data accumulates.

  3. Past performance ≠ future results. Markets change. We're sharing measured results, not making promises.

  4. This is Large/Mega focused. Mid/Small shows weaker separation in stressed markets — we're honest about that.


Disclaimer: This post is not investment advice. It reports how specific cohorts performed over specific windows. We will publish these reports repeatedly as the system learns and as more windows mature.