Methodology & Data Sources

Methodology

The Hedonic Factor Model

At each point in time we run a cross-sectional OLS regression of log(price/sqm) on property characteristics using a rolling 3-month window. The fitted coefficients (betas) tell us the market price of each characteristic in that window.

For each rolling window t and each transaction i, the regression is:

      log(pi,t / ai,t) = αt + Σk βk,t · xk,i,t + Σz γz,t · 1{postcodei = z} + εi,t
    

where p is the transacted price, a the floor area in square metres, x_k the k-th hedonic characteristic (continuous or dummy), and the postcode indicator is a control rather than a return factor. The intercept α_t is the city-wide quality-adjusted log price level — the Baseline Market factor.

Tracking β_t × mean(X_t) over time gives factor returns. Their first differences are the period-on-period returns:

      Δ(βt × mean(Xt)) = period return attributable to factor j

      cumsum(Δ) = cumulative factor return

This captures both how the market reprices a characteristic (Δβ) and compositional shifts in what's being sold (ΔX). Two implementation notes that often trip people up:

Factors do not add up to a total return. They're additive in log space, multiplicative in level space, so summing them across factors has no economic meaning. Quote each factor's return independently.
Period returns are not simple differences of cumulative percentages. Use (1 + latest/100) / (1 + prev/100) − 1, not latest − prev.

Repricing vs composition

The change in any factor's monthly contribution to log price can be exactly decomposed into a pure-repricing component and a composition-shift component:

      Δ(βk,t · x̄k,t) = x̄k,t · Δβk,t + βk,t−1 · Δx̄k,t
    

The first term — pure repricing at the current basket — is the factor return we publish. The second term — basket-weight shift, valued at last month's price — is the compositional drift, which we route into the Baseline Market column. A small reconciliation term (the cross-product Δβ·Δx) ensures the per-factor contributions plus the Baseline plus the cross-term always equal the model's predicted change in log price.

Why a rolling 3-month window

Sample size. A single calendar month doesn't give enough transactions per postcode × type cell to fit a stable regression. Three months roughly triples the cross-sectional sample without forcing us to look at quarterly data.
Seasonality. Monthly volumes are seasonal but the 3-month rolling cut largely smooths out the dip-and-bounce pattern.
Stability. Coefficients on rarer factor cells (EPC A/B, new build, certain age bands) move violently with single-month windows. Three months keeps the marginal-buyer signal visible without exploding the variance.

Factor Selection

Factors are selected iteratively using forward selection. At each step, candidate factors are evaluated over all rolling windows. A factor is accepted if:

Median p-value across all windows < 0.10
Significant (p < 0.10) in ≥ 50% of windows
Return time-series correlation with already-accepted factors < 0.50

This ensures each factor adds independent, stable information to the model.

Per-city feature engineering

The set of hedonic characteristics differs by city because the underlying datasets differ. Where multiple encodings are plausible we prefer the simplest one that produces an interpretable coefficient.

London. Floor area in sqm (Land Registry × EPC join). EPC band collapsed to a three-level scale {A/B = +1, C–F = 0, G = −1}. Construction-period band centred on mid-century (U-shaped premium for pre-1900 and modern stock). Freehold/leasehold encoded as ±1. House vs flat as a dummy. Number of habitable rooms entered as a centred-squared deviation.
New York. Building-scale factors only (NYC DOF transactions don't carry unit-level square footage): log of residential units, high-rise dummy, elevator-building dummy, building age, single-family-house dummy. Construction-period band on the same U-shape as London.
Paris. Log floor area, GES greenhouse-gas score on the published linear scale, age-band squared, and sqm-per-room (a unit-quality proxy that turns out to be a meaningful Paris factor).
Singapore. Log building age, floor-deviation squared and age-deviation squared (each centred on the typical-HDB-flat value), low-floor / high-floor dummies, room density.

Quality Controls

Top 0.5% by price/sqm removed each window (data errors and ultra-prime outliers)
Observations with z-score < −5 on log(ppsqm) removed (suspiciously cheap transactions)
Minimum 50 transactions per window; sparser windows are skipped
Non-linear encodings (U-shapes, collapsings) impose economic priors to prevent fitting noise
Standard errors and p-values are stored alongside the coefficients and rendered as the red-shaded band on the per-factor charts, so insignificant periods are visible without disappearing them from the time series

Limitations

The model is unconditional on macro variables (rates, GDP, etc.). We measure the moves; we don't attribute them to a specific cause beyond what the hedonic decomposition gives us.
The most recent 2–3 months of each series should be treated as provisional — the rolling window hasn't fully updated and late-arriving registrations will revise figures slightly.
Outside the four cities we currently cover, the same methodology can be applied wherever a transaction-level dataset with property characteristics is publicly available.

References

The hedonic approach to residential property pricing goes back to Rosen (1974). The classic survey is Sirmans, Macpherson and Zietz (2005), The Composition of Hedonic Pricing Models, Journal of Real Estate Literature 13(1).