MLB Projection Model
Why Negative Binomial?
MLB run scoring is "overdispersed": the variance of runs scored is roughly 2.1 times the mean, far above the 1.0 ratio that a simple Poisson distribution assumes. Runs tend to cluster in baseball, with a single big inning often producing 4-5 runs at once, so some games turn into blowouts while others are 1-0 pitchers' duels. A distribution that can capture that burstiness is required.
The Negative Binomial distribution handles this by adding a second parameter, a dispersion value (r = 3.74, calibrated from 40+ years of MLB data) that lets the model capture both the average scoring rate and the burstiness of run production. This produces a full score matrix up to 15 runs per side, from which every market probability is derived.
Baseball is also uniquely pitcher-driven. The starting pitcher is the single most important variable in any game, substantially more influential than any equivalent role in hockey, basketball, or soccer. The model is built around a detailed pitcher rating system that blends recent form, season totals, and prior year data, with every other factor (lineup strength, park, weather, umpire) layered on top.
Core Formula
Each team's projected runs start from the league average (4.345 runs/game) and are scaled by multiplicative factors for the starting pitcher, opposing lineup, and environment (park, weather, umpire). The game is split into starter innings (5.5) and bullpen innings (3.5), with each segment rated separately. When a pitcher is flagged as an opener / bullpen game, the starter-side IP shrinks to ~1.5 and the balance is modeled as bullpen innings. The stacked projection is then shrunk toward league average by 70%trust (mirror of the K-prop shrinkage) to counter winner's-curse amplification at the tails of the factor stack. Additive adjustments for home-field advantage, velocity decline, and bullpen fatigue are applied last.
Pitcher Rating Blend
Pitchers are rated by blending three time windows. The current-season line gets the most weight because it reflects the pitcher's actual form, but recent starts are weighted heavily to catch mechanical changes, fatigue, or injury effects that season averages would smooth over. The prior-year component provides stability early in the season when the sample is small.
| Parameter | Value |
|---|---|
| Recent 3-5 starts | 10% |
| Full season | 70% |
| Prior year | 20% |
FIP Composite
The model rates pitchers on a blend of three fielding-independent metrics. ERA is avoided because it conflates a pitcher's performance with team defense and sequencing luck. FIP measures what a pitcher controls directly (strikeouts, walks, home runs). xFIP stabilizes the home run component by assuming a league-average HR/FB rate, filtering out park and luck effects. SIERA goes further by accounting for a pitcher's batted ball profile (ground balls vs fly balls) and how that interacts with strikeout rates.
| Parameter | Value |
|---|---|
| FIP | 35% |
| xFIP | 40% |
| SIERA | 25% |
Offense Weighting
Team offense is measured by wRC+ (Weighted Runs Created Plus), a park- and league-adjusted stat where 100 is average. Season-long data gets more weight because offensive performance stabilizes quickly, but a 14-day window captures lineup changes and hot/cold streaks.
| Parameter | Value |
|---|---|
| Season wRC+ | 70% |
| Recent 14 days | 30% |
Environmental Adjustments
Baseball is played outdoors in 22 of 30 stadiums, so environment matters more than in any other sport. The model adjusts for park dimensions (Coors Field plays very differently from Oracle Park), game-day weather (wind and temperature affect how far the ball carries), and the home plate umpire (whose strike zone tendencies shift strikeout and walk rates, impacting run scoring).
| Parameter | Value |
|---|---|
| Home field advantage | +0.3 runs |
| Wind effect | 0.1 runs per 5 mph |
| Temperature effect | ±0.05 runs per 5F outside 60-70F |
| Cold-wind damping | 30% floor below 60F |
| Umpire adjustment | 0.25 to 0.5 runs |
| Dome stadiums | 8 stadiums |
| Park factors | 30 parks |
Early Season Regression
Small samples are dangerous in baseball. A pitcher who has thrown 20 innings could look elite or terrible purely by chance. Early in the season, the model blends current stats with prior-year data and league averages using Marcel-style regression, gradually trusting the new data more as the sample grows. This prevents the model from overreacting to hot or cold April streaks.
| Parameter | Value |
|---|---|
| Prior year regression | 40% |
| Pitcher IP for full trust | 50 IP |
| Team games for full trust | 28 games |
League Baselines
These are the league-wide averages that anchor the model. Every team and player is rated relative to them. A pitcher with a 3.50 FIP in a 4.40 ERA environment, for example, is suppressing runs by about 20%. These values are updated weekly as the season progresses to reflect the current run environment.
| Parameter | Value |
|---|---|
| Runs per game | 4.345 |
| ERA | 4.107239799836737 |
| K% | 22.0% |
| BABIP | .3 |
| xwOBA | .315 |
Minimum Edge Thresholds
A value play is only flagged when the model's edge over the sportsbook exceeds these minimums. Lower thresholds (2%) are used for liquid, efficient markets like moneyline and totals. Higher thresholds (3-4%) are required for noisier markets like run lines and props, where the model needs a bigger cushion to overcome estimation error.
| Parameter | Value |
|---|---|
| Moneyline | 2.0% |
| Run Line | 3.0% |
| Total | 2.0% |
| F5 Moneyline | 2.0% |
| F5 Total | 2.0% |
| Pitcher K Prop | 5.0% |
| Batting Prop | 4.0% |
Markets Explained
The model produces fair odds for every major MLB betting market. Here's what each one is and how the model handles it.
Moneyline
BOS -145 / TOR +125Straight pick on who wins the game, extra innings included. The most liquid and efficiently priced MLB market. The model typically sees its smallest edges here, and the EV threshold is correspondingly lower (2% min) because the prices are clean.
Run Line (-1.5)
BOS -1.5 +135 / TOR +1.5 -155A 1.5-run spread: the favorite has to win by 2+, the underdog can lose by 1 and still cover. Higher volatility than moneyline because it forces margin-of-victory estimation, so the model requires a larger edge (3-4%) to flag.
Game Total (Over/Under)
O 8.5 -110 / U 8.5 -110Combined runs from both teams. Driven by starting pitchers, park factors, weather (wind + temperature), and the umpire's strike zone. Half-lines (8.5) are preferred over whole lines (9) because they eliminate pushes.
Team Totals
BOS O 4.5 -115 / BOS U 4.5 -105Over/under on a single team's runs. Useful when you have a strong read on one side's offense vs the opposing pitcher but no view on the game total. Noisier than game totals because half the variance is halved out.
First 5 Innings (F5)
F5 ML BOS -130 / F5 O 4.5 -110Isolates the starting-pitcher matchup by ignoring the bullpens. The model's pitcher ratings are its sharpest input, so F5 markets often surface the clearest edges, especially when one team's bullpen is shaky while the starter is sharp.
Pitcher Strikeout Props
Kershaw O 6.5 K -120Over/under on a starter's total strikeouts. The expected K count comes from a Bayesian Beta posterior on pitcher K% combined with Log5 matchup, times-through-the-order, umpire, and park multipliers. The over/under probabilities at each line are computed from a Negative Binomial distribution (r=20) calibrated to MLB historical K count fits.
Pitcher Strikeout Props
The expected K count comes from a Bayesian Beta posterior over pitcher K% (prior centered on the previous-season K% with a Stuff+ adjustment, updated with current-season K/BF counts), combined with Log5 matchup against the lineup, a times-through-the-order penalty, and umpire/park K multipliers. When confirmed lineups are available, a 100,000-trial Monte Carlo simulation produces per-batter prop distributions (hits, total bases, HR) walking through the 9-man lineup; the K count mean from that pipeline is then used as the location parameter of the Negative Binomial that produces over/under probabilities for each posted line.
The Negative Binomial uses dispersion r=20, the same value the totals model uses for run scoring. Real MLB K count distributions sit at Var/Mean ≈ 1.25 — slightly more dispersed than Poisson but tighter than runs. The shape is calibrated to ~40 years of historical K-count data.
| Parameter | Value |
|---|---|
| Engine | Monte Carlo |
| Beta Prior | 100 BF |
| Stuff+ Adj | 15.0% |
| Matchup | Log5 |
| TTO Penalty | 1.05× / 1× / 0.92× |
| Max Baseline K% | 35.0% |
| Min Observed BF | 40 BF |
Model Track Record
MLB is the most mature of the four models and the only one with a published full-season backtest. The numbers below come from replaying the pipeline across the entire 2025 regular season (2,421 games) using walk-forward data, meaning each game was projected with only the information that was available beforehand, with no peeking at results. All bets were graded at the prices that would have been available on a standard -110 market.
| Parameter | Value |
|---|---|
| Games backtested | 2,421 |
| Overall accuracy | 55.8% |
| Strong picks (>60% / <40%) | 67.0% |
| Brier skill score | +0.0261 |
The Brier skill score is arguably the most important number here. It measures how well the model's probabilities are calibrated: whether a 60% forecast wins 60% of the time, whether a 30% forecast wins 30% of the time, and so on across the whole distribution. A model can pick winners at a high rate while still losing money if its probabilities are miscalibrated, and the Brier score penalizes that directly. Positive Brier skill plus above-breakeven accuracy across 2,400+ games is the statistical fingerprint of a working model.
When to Trust This Model (and When Not To)
Every model has soft spots. Being honest about MLB's keeps you from over-betting situations the math doesn't actually cover well.
- Games after mid-April, once starter samples are ~30+ IP
- Confirmed starting pitchers (not "TBD")
- Outdoor games with ingested weather data
- Matchups between teams at least 20 games into the season
- F5 (first 5 innings) markets, where pitcher signal is strongest
- Opening Day through mid-April (small pitcher samples)
- Bullpen or opener games (the pitcher rating assumes a true starter)
- Spring training, exhibition, and international-series games (not supported)
- Last-second lineup changes that the model hasn't re-ingested yet
- Extremely rainy forecasts that could become rainouts
The parameter values on this page are served live from the model configuration and refresh periodically; when a weight or threshold changes, this page reflects it automatically.