Methodology & Data Sources

How CityDataLab decomposes real estate prices — the data, the factor model, and the thinking behind it.

Methodology

The Hedonic Factor Model

At each point in time we run a cross-sectional OLS regression of log(price/sqm) on property characteristics using a rolling 3-month window. The fitted coefficients (betas) tell us the market price of each characteristic in that window.

For each rolling window t and each transaction i, the regression is:

log(pi,t / ai,t) = αt + Σk βk,t · xk,i,t + Σz γz,t · 1{postcodei = z} + εi,t

where p is the transacted price, a the floor area in square metres, xk the k-th hedonic characteristic (continuous or dummy), and the postcode indicator is a control rather than a return factor. The intercept αt is the city-wide quality-adjusted log price level — the Baseline Market factor.

Tracking βt × mean(Xt) over time gives factor returns. Their first differences are the period-on-period returns:

Δ(βt × mean(Xt)) = period return attributable to factor j
cumsum(Δ) = cumulative factor return

This captures both how the market reprices a characteristic (Δβ) and compositional shifts in what's being sold (ΔX). Two implementation notes that often trip people up:

Repricing vs composition

The change in any factor's monthly contribution to log price can be exactly decomposed into a pure-repricing component and a composition-shift component:

Δ(βk,t · x̄k,t) = x̄k,t · Δβk,t + βk,t−1 · Δx̄k,t

The first term — pure repricing at the current basket — is the factor return we publish. The second term — basket-weight shift, valued at last month's price — is the compositional drift, which we route into the Baseline Market column. A small reconciliation term (the cross-product Δβ·Δx) ensures the per-factor contributions plus the Baseline plus the cross-term always equal the model's predicted change in log price.

Why a rolling 3-month window

Factor Selection

Factors are selected iteratively using forward selection. At each step, candidate factors are evaluated over all rolling windows. A factor is accepted if:

This ensures each factor adds independent, stable information to the model.

Per-city feature engineering

The set of hedonic characteristics differs by city because the underlying datasets differ. Where multiple encodings are plausible we prefer the simplest one that produces an interpretable coefficient.

Quality Controls

Limitations

References

The hedonic approach to residential property pricing goes back to Rosen (1974). The classic survey is Sirmans, Macpherson and Zietz (2005), The Composition of Hedonic Pricing Models, Journal of Real Estate Literature 13(1).

Data Sources

City Data Source Date Range Transactions

All transaction data is used in anonymised aggregate form only. No individual property records are stored or displayed.

Factors by City

Contact

Questions about the methodology, data, or cities covered? Send us a message.