2024–2025 Backtest Results

Grammer/Graham BaseRuns + PythagenPat + FIP/SIERA · $50 flat unit · 2.65 underdog filter · 12% edge floor · Soft prob cap 0.72

Total P&L
+$6,495
2024 + 2025 combined
ROI
16.6%
782 bets · $50 unit
Win Rate
56.1%
439 W / 343 L
Reg. Slope
0.448
raw model · ideal 1.0
Avg Mkt Odds
2.11
post 2.65 filter
Previous model (ERA as xFIP · $100 flat)

Bets: 1,001
Win rate: 50.8%
ROI: 10.4%
Odds 2.7+ bets: 200 @ 28.5% WR
Edge floor: 10%

Current model (FIP/SIERA · $50 flat)

Bets: 782
Win rate: 56.1%
ROI: 16.6%
Underdog filter: ≤2.65
Edge floor: 12% · Soft cap: 0.72

By season
VALUE vs STRONG
Home vs Away
Win rate lifted from 50.8% → 56.1%The 2.65 underdog filter removed 200 low-quality bets (28.5% WR). Fewer bets, much higher quality. ROI improved from 10.4% to 16.6% on a per-bet basis.
STRONG now outperforms VALUE (17.6% vs 16.2% ROI)With underdog STRONG bets removed, the signal correctly identifies higher-conviction picks. The 25% edge threshold is now justified by the data.
Model overconfident above 63% probabilityRegression slope 0.448 (ideal 1.0) confirms the model is too confident at high probabilities. A soft cap of 0.72 prevents the most extreme predictions. This is an analytical observation — the raw signal (56.1% WR) remains profitable.
Edge sweet spot is 14–20%The 14–20% edge range returns 25–26% ROI. Above 25%, overconfidence inflates apparent edges. Edge floor raised to 12% to remove the weakest VALUE bets. The 10–12% bucket was returning only 5% ROI.

Model Calibration

How closely do model predictions match actual win rates? The closer the blue bubbles to the green line, the better calibrated the model is.

Model probability vs actual win rate (2024–2025 · 782 bets)

Each bubble is a bucket of bets grouped by raw model win probability. Bubble size = number of bets. The model tracks well in the 49–63% range but is significantly overconfident above 63% — predicted 74% actually wins only 58%.

Actual win rate Market implied prob Perfect calibration Regression fit (slope 0.448)
Slope
0.448
ideal = 1.0 · overconfident
Intercept
0.293
baseline lift from model
R² / p-value
0.010
p = 0.005 · significant
Regression: Actual win% ≈ 0.293 + 0.448 × Raw model probability
When model says 70% → expected actual win rate: 0.293 + 0.448×0.70 = 60.6%
When model says 55% → expected actual win rate: 0.293 + 0.448×0.55 = 53.9%
Correction applied: soft cap at 0.72 prevents predictions above 72%, addressing the worst overconfidence without shrinking profitable mid-range signals.

Calibration table

Model prob bucketBetsModel probActual WRMarket probGapVerdict
42–49%11846.3%43.2%40.4%−3.1%Watch
49–56%20952.6%54.5%44.5%+1.9%Good
56–63%20859.5%60.1%50.8%+0.6%Best
63–70%10365.8%56.3%53.0%−9.5%Watch
70–78%5574.3%58.2%51.9%−16.1%Overconfident
78–86%8782.3%66.7%53.8%−15.6%Overconfident

Gap = actual WR minus model predicted. Negative = model overestimates. Overconfidence concentrated above 63%. Soft cap addresses the 70–86% range.

Edge Analysis

Where does the model's edge come from? The 14–20% zone is the sweet spot; 25%+ is where model overconfidence inflates apparent edges.

Win rate & ROI by model edge bucket

Edge = (model prob − market prob) / market prob. The 14–20% range shows both the highest win rate (60%) and best ROI (25–26%). Above 35% edge the wins return because those are genuine market mispricings at moderate odds, not just overconfidence.

Win rate % ROI %
Edge rangeBetsWin rateP&LROISignal
10–12% (removed)~51%Low~5%Floor raised to 12%
12–14%11454.4%$61210.7%Marginal
14–17%12560.0%$1,55824.9%Sweet spot
17–20%7560.0%$98926.4%Sweet spot
20–25%8356.6%$82319.8%Good
25–35%5453.7%$34912.9%Moderate
35–50%6064.9%$73824.6%Good
50%+7655.3%$52813.9%Moderate

Performance Splits

Detailed breakdown by odds range, season, and home/away. Parlay P&L is completely separate — see Parlay Strategy tab.

By market odds range
Season comparison
Odds rangeBetsWin rateModel probMarket probP&LGap
1.4–1.72572.0%78.1%62.6%+$184−6.1%
1.7–2.033462.3%66.0%53.3%+$2,808−3.7%
2.0–2.322052.3%57.3%46.1%+$1,492−5.0%
2.3–2.6520348.3%50.9%40.2%+$2,012−2.6%

All four odds buckets are profitable. Gap = actual WR minus model predicted. Negative = slight overconfidence, still profitable due to market mispricing.

FIP/SIERA regression flag performance
FlagBetsWin rateP&LInterpretation
REGRESS3452.9%+$251ERA − SIERA ≥ 1.0 · pitcher overperforming, expect regression
No flag74856.4%+$6,244ERA and FIP/SIERA within 1.0 of each other — normal range

REGRESS flag now fires with real FIP/SIERA data. Small sample (34 bets) — continue monitoring over the 2026 season.

Parlay Strategy

Parlay results are tracked and reported completely separately from straight-bet P&L above. These are optional supplementary bets — not placed every day.

📐
The math: compounding +EV betsWhen multiple games each have positive expected value, combining them into a parlay compounds that edge. The combined EV exceeds the sum of individual straight bets, providing better theoretical return at the cost of higher variance and lower win frequency. Parlays also vary the profile of bets placed, which is a practical benefit when managing betting accounts.
EV = (p₁ × p₂ × ...) × (O₁ × O₂ × ... − 1) × stake − (1 − p₁ × p₂ × ...) × stake
At avg model prob 0.600 and avg odds 2.11: EV per 2-leg parlay = $30.24 vs $26.68 for two straight bets → 113% EV per dollar staked
Backtest results by leg count (2024–2025 · $12.50 stake · best combo per day · SEPARATE from straight bets)
2-leg parlays
−2.6%
188 combos · 27 wins (14.4%)
P&L: −$60.34
Avg combo odds: 7.72
2024: −$102  ·  2025: +$42
3-leg parlays
+39.8%
185 combos · 16 wins (8.6%)
P&L: +$920.26
Avg combo odds: 19.22
2024: +$631  ·  2025: +$290
4-leg parlays
+54.8%
181 combos · 8 wins (4.4%)
P&L: +$1,240
Avg combo odds: 45.48
2024: +$304  ·  2025: +$936

ROI comparison — straight bets vs parlay tiers

Note: straight bets win ~56% of the time; 3-leg parlays win ~8.6%; 4-leg parlays win ~4.4%. Higher ROI reflects compounded edge at the cost of infrequent wins. Only the highest-EV combo per leg-count per day is counted.

Win rate % ROI %
Parlay leg qualification (three tiers)
How legs are selectedSTRONG / VALUE — has a bet signal (edge ≥ 12%).  EV+ — model_prob × market_odds > 1.0 (positive EV, no signal required).  DUMMY — model_prob ≥ 60% (high-confidence pad). Each leg from a different game. Stake: $12.50 per combination.
Variance warningA 4-leg parlay wins only ~4.4% of the time even when +EV. The EV is real across a large sample but individual sessions will show mostly losses. Treat parlays as supplementary, not a replacement for straight bets. The daily model output (PARLAYS sheet) shows the highest-EV combinations available that day — they are suggestions only.