The models · 10 min read

MLB Projection Model

How Fairline's MLB model works: a Negative Binomial run-scoring engine with pitcher FIP blending, park and weather environment, bullpen fatigue, and the markets it prices.

Updated Jun 2026 · Part of the models series

Why Negative Binomial?

MLB run scoring is "overdispersed": the variance of runs scored is roughly 2.1 times the mean, far above the 1.0 ratio that a simple Poisson distribution assumes. Runs tend to cluster in baseball, with a single big inning often producing 4-5 runs at once, so some games turn into blowouts while others are 1-0 pitchers' duels. A distribution that can capture that burstiness is required.

The Negative Binomial distribution handles this by adding a second parameter, a dispersion value (r = 3.74, calibrated from 40+ years of MLB data) that lets the model capture both the average scoring rate and the burstiness of run production. This produces a full score matrix up to 15 runs per side, from which every market probability is derived.

Baseball is also uniquely pitcher-driven. The starting pitcher is the single most important variable in any game, substantially more influential than any equivalent role in hockey, basketball, or soccer. The model is built around a detailed pitcher rating system that blends recent form, season totals, and prior year data, with every other factor (lineup strength, park, weather, umpire) layered on top.

Core Formula

Projected Runs = League_Avg x Pitcher_Factor x Lineup_Factor x Environment_Factor

Each team's projected runs start from the league average (4.374 runs/game) and are scaled by multiplicative factors for the starting pitcher, opposing lineup, and environment (park, weather, umpire). The game is split into starter innings (5.5) and bullpen innings (3.5), with each segment rated separately. When a pitcher is flagged as an opener / bullpen game, the starter-side IP shrinks to ~1.5 and the balance is modeled as bullpen innings. The stacked projection is then shrunk toward league average by 70%trust (mirror of the K-prop shrinkage) to counter winner's-curse amplification at the tails of the factor stack. Additive adjustments for home-field advantage, velocity decline, and bullpen fatigue are applied last.

Pitcher Rating Blend

Pitchers are rated by blending three time windows. The current-season line gets the most weight because it reflects the pitcher's actual form, but recent starts are weighted heavily to catch mechanical changes, fatigue, or injury effects that season averages would smooth over. The prior-year component provides stability early in the season when the sample is small.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Recent 3-5 starts	10%	Catches hot/cold streaks and mechanical changes
Full season	70%	Most stable sample of current ability
Prior year	20%	Anchors the rating when current data is thin

FIP Composite

The model rates pitchers on a blend of three fielding-independent metrics. ERA is avoided because it conflates a pitcher's performance with team defense and sequencing luck. FIP measures what a pitcher controls directly (strikeouts, walks, home runs). xFIP stabilizes the home run component by assuming a league-average HR/FB rate, filtering out park and luck effects. SIERA goes further by accounting for a pitcher's batted ball profile (ground balls vs fly balls) and how that interacts with strikeout rates.

Live configurationrendered from the running system, always current

Parameter	Value	Why
FIP	35%	Raw K/BB/HR; most responsive to current form
xFIP	40%	Neutralizes HR luck, stronger for projection
SIERA	25%	Adds batted-ball context, best predictor

Offense Weighting

Team offense is measured by wRC+ (Weighted Runs Created Plus), a park- and league-adjusted stat where 100 is average. Season-long data gets more weight because offensive performance stabilizes quickly, but a 14-day window captures lineup changes and hot/cold streaks.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Season wRC+	70%	Large, stable sample; the foundation of the rating
Recent 14 days	30%	Captures lineup changes and streaks

Environmental Adjustments

Baseball is played outdoors in 22 of 30 stadiums, so environment matters more than in any other sport. The model adjusts for park dimensions (Coors Field plays very differently from Oracle Park), game-day weather (wind and temperature affect how far the ball carries), and the home plate umpire (whose strike zone tendencies shift strikeout and walk rates, impacting run scoring).

Live configurationrendered from the running system, always current

Parameter	Value	Why
Home field advantage	+0.3 runs	Split evenly: home gets +half, away gets -half
Wind effect	0.1 runs per 5 mph	Outbound wind boosts scoring; inbound suppresses it
Temperature effect	±0.085 runs per 5F outside 60-70F	Warm air carries balls further; cold air is denser and suppresses scoring
Cold-wind damping	30% floor below 60F	When cold, wind-out's run boost is reduced because dense air limits fly-ball carry
Umpire adjustment	0.25 to 0.5 runs	121 umpires profiled by K/BB/run tendencies
Dome stadiums	8 stadiums	Weather adjustments skipped for domed/retractable roofs
Park factors	30 parks	Per-park run, HR, and handedness-split factors

Early Season Regression

Small samples are dangerous in baseball. A pitcher who has thrown 20 innings could look elite or terrible purely by chance. Early in the season, the model blends current stats with prior-year data and league averages using Marcel-style regression, gradually trusting the new data more as the sample grows. This prevents the model from overreacting to hot or cold April streaks.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Prior year regression	40%	Blends prior stats 40% toward league average
Pitcher IP for full trust	50 IP	Below this, prior-year data gets more weight
Team games for full trust	28 games	Below this, preseason projections blended in

League Baselines

These are the league-wide averages that anchor the model. Every team and player is rated relative to them. A pitcher with a 3.50 FIP in a 4.40 ERA environment, for example, is suppressing runs by about 20%. These values are updated weekly as the season progresses to reflect the current run environment.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Runs per game	4.374	The anchor; all projections scale from this
ERA	4.207859559522886	Tracks the overall pitching environment
K%	22.0%	Strikeout rate used for K-prop calibration
BABIP	.3	Batting avg on balls in play (luck indicator)
xwOBA	.315	Expected weighted on-base avg from Statcast

Internal observation policy

Fairline records an internal observation whenever the model's fair odds differ from the sportsbook's price by a flat 3%, the same threshold for every market. Each row carries a one-unit research weight for calibration and closing-line tracking. These observations are not surfaced as bets.

Command-line reference. The standalone model runner also includes a quarter-Kelly staking calculator with per-market EV thresholds (below). These are an offline reference and do not drive any user-facing surface. The internal measurement record uses the flat 3% threshold and one-unit research weight described above.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Moneyline	2.0%
Run Line	3.0%
Total	2.0%
F5 Moneyline	2.0%
F5 Total	2.0%
Pitcher K Prop	5.0%
Batting Prop	4.0%

Markets Explained

The model produces fair odds for every major MLB betting market. Here's what each one is and how the model handles it.

Moneyline

BOS -145 / TOR +125

Straight pick on who wins the game, extra innings included. The most liquid and efficiently priced MLB market. The model typically sees its smallest edges here, and the EV threshold is correspondingly lower (2% min) because the prices are clean.

Run Line (-1.5)

BOS -1.5 +135 / TOR +1.5 -155

A 1.5-run spread: the favorite has to win by 2+, the underdog can lose by 1 and still cover. Higher volatility than moneyline because it forces margin-of-victory estimation, so the offline staking reference uses a larger edge threshold (3-4%).

Game Total (Over/Under)

O 8.5 -110 / U 8.5 -110

Combined runs from both teams. Driven by starting pitchers, park factors, weather (wind + temperature), and the umpire's strike zone. Half-lines (8.5) are preferred over whole lines (9) because they eliminate pushes.

Team Totals

BOS O 4.5 -115 / BOS U 4.5 -105

Over/under on a single team's runs. Useful when you have a strong read on one side's offense vs the opposing pitcher but no view on the game total. Noisier than game totals because half the variance is halved out.

First 5 Innings (F5)

F5 ML BOS -130 / F5 O 4.5 -110

Isolates the starting-pitcher matchup by ignoring the bullpens. The model's pitcher ratings are its sharpest input, so F5 markets often surface the clearest edges, especially when one team's bullpen is shaky while the starter is sharp.

Pitcher Strikeout Props

Kershaw O 6.5 K -120

Over/under on a starter's total strikeouts. The expected K count comes from a Bayesian Beta posterior on pitcher K% combined with Log5 matchup, times-through-the-order, umpire, and park multipliers. The over/under probabilities at each line are computed from a Negative Binomial distribution (r=20) calibrated to MLB historical K count fits.

Pitcher Strikeout Props

The expected K count comes from a Bayesian Beta posterior over pitcher K% (prior centered on the previous-season K% with a Stuff+ adjustment, updated with current-season K/BF counts), combined with Log5 matchup against the lineup, a times-through-the-order penalty, and umpire/park K multipliers. When confirmed lineups are available, a 100,000-trial Monte Carlo simulation produces per-batter prop distributions (hits, total bases, HR) walking through the 9-man lineup; the K count mean from that pipeline is then used as the location parameter of the Negative Binomial that produces over/under probabilities for each posted line.

The Negative Binomial uses dispersion r=20, the same value the totals model uses for run scoring. Real MLB K count distributions sit at Var/Mean ≈ 1.25, slightly more dispersed than Poisson but tighter than runs. The shape is calibrated to ~40 years of historical K-count data.

Live configurationrendered from the running system, always current

Parameter	Value	Why
Engine	Monte Carlo	100,000 trials per pitcher
Beta Prior	100 BF	BF-equivalent confidence in Steamer/prior baseline (K% stabilizes around 83 BF)
Stuff+ Adj	15.0%	Multiplicative shift to Beta prior center; ~3% relative at Stuff+ 120
Matchup	Log5	Multiplicative pitcher × batter K% combination
TTO Penalty	1.05× / 1× / 0.92×	K rate multiplier per pass through the order
Max Baseline K%	35.0%	Hard ceiling on any K% fed into the prior (no real pitcher above ~37% true talent)
Min Observed BF	40 BF	Value detection requires this much season sample before flagging a K prop

Model Track Record

MLB is the most mature of the four models and the only one with a published full-season backtest. The numbers below come from replaying the pipeline across the entire 2025 regular season (2,421 games) using walk-forward data, meaning each game was projected with only the information that was available beforehand, with no peeking at results. All bets were graded at the prices that would have been available on a standard -110 market.

Parameter	Value	Why
Games backtested	2,421	Full 2025 MLB regular season, walk-forward
Overall accuracy	55.8%	Breakeven at standard -110 pricing is 52.4%
Strong picks (>60% / <40%)	67.0%	When the model is confident, it's right more often
Brier skill score	+0.0261	Positive = better than coin flip; measures calibration

The Brier skill score is arguably the most important number here. It measures how well the model's probabilities are calibrated: whether a 60% forecast wins 60% of the time, whether a 30% forecast wins 30% of the time, and so on across the whole distribution. A model can pick winners at a high rate while still losing money if its probabilities are miscalibrated, and the Brier score penalizes that directly. Positive Brier skill plus above-breakeven accuracy across 2,400+ games is the statistical fingerprint of a working model.

When to Trust This Model (and When Not To)

Every model has soft spots. Being honest about MLB's keeps you from over-betting situations the math doesn't actually cover well.

Trust it most

Games after mid-April, once starter samples are ~30+ IP
Confirmed starting pitchers (not "TBD")
Outdoor games with ingested weather data
Matchups between teams at least 20 games into the season
F5 (first 5 innings) markets, where pitcher signal is strongest

Trust it least

Opening Day through mid-April (small pitcher samples)
Bullpen or opener games (the pitcher rating assumes a true starter)
Spring training, exhibition, and international-series games (not supported)
Last-second lineup changes that the model hasn't re-ingested yet
Extremely rainy forecasts that could become rainouts

The parameter values on this page are rendered from the running system and refresh periodically; when a weight or threshold changes, this page reflects it automatically.