- Friday 28 February 2020
- Dr Luca Barbaglia; Joint Research Centre; European Commission
In this paper we empirically investigate the loan default behaviour in the European market, using a novel, big data set on over 20 million residential mortgages observed over the period from 2013 to 2018. We model the occurrence of a default as a function of loan-level information at origination, characteristics of the financial institution originating the loan, borrower's economic situation, as well as local economic conditions. We adopt three alternative machine learning techniques useful for predicting default events, namely the penalised logistic regression, gradient boosting and extreme gradient boosting, and carry the analysis both at regional and country level.
We exploit techniques from recent literature on interpretable machine learning to identify the most relevant factors affecting default and to capture the non-linear effects of some characteristics on default. We find that the most important variable in explaining default is the interest rate currently applied to the mortgage and the local economic characteristics, while other or borrower-specific features are less relevant. Our results point at consistent geographical heterogeneity in variable importance magnitudes, indicating the need of European policy that is regionally tailored.