Prepayment Analytics

The Loan Portfolio Cash Flow Forecast & Analytics offers in-depth cash flow projections accounting for defaults and prepayments, essential for financial planning and interest risk management. It also supports Investor & Warehouse Securitization Reporting with comprehensive pool analysis and aids in Loan Pricing & Scenario Optimization to maintain a balance between profitability and risk.

Prepayment Analytics

People with a loan have the possibility to repay part or the entire outstanding loan before the due date. These prepayments make the length of the portfolio of loans stochastic, which creates problems in the refinancing policy of the bank or a financial institution and affects the asset & liability cash flow management process. Moreover, the interest rate risk arises from prepayments, meaning that being able to forecast the prepayment rate can increase the performance of the hedging strategy of a bank.

Executive Summary

Given the magnitude of the loan portfolio in any balance sheet of a bank, estimating the prepayment rate is crucial. There are two types of models that are used by the platform, namely, the optimal prepayment model, which sees prepayment as a consequence of rational behavior (e.g. prepayments are always exercised at an optimal time), and the exogenous model which also takes into account other macroeconomic variables, client specifics, and loan characteristics. The focus will be on the second model type. Specifically, the problem will be approached as a classification task that will be carried out with two different ML techniques: Random Forests and ANNs. Since prepayments are rare events, this leads to an imbalanced dataset framework. The imbalance between classes creates complications in the development of the algorithm, hence ad hoc corrections are applied to solve them.

Prepayment Risk

The possibility to prepay exposes the bank to prepayment risk, which can be divided into two classes:

4.1.1 Interest rate riskIt arises because fixed rate loans are usually hedged against interest rate changes using Interest Rate Swaps (IRS).

4.1.2 Liquidity riskIt arises from the fact that the liquidity profile of a bank is strongly influenced by the maturity profile of loans conditional to prepayments since prepayments are such a big risk for the bank, it is fundamental to be able to predict the conditional prepayment rate (CPR), which is the portion of the principal of a pool of loans assumed to prepay in each period, in order to better forecast future cash flows.

Different Kinds of Prepayments

Prepayments are gathered in two main categories inside a bank’s Assets & Liabilities Management (ALM) department:

Redemption

This is when a customer pays back part of the notional. Redemptions may be scheduled as part of the amortization scheme, or not. If they happen on the loan rate reset date, then there is no interest rate risk since the entire notional would have been repriced anyway. However, if they do not occur on a reset date, then it is early, and we have an interest rate risk. Early full redemptions can occur due to the death of the client, the insurance pay-out following the collateral destruction, movement to another house, or voluntarily. A partial early redemption is always voluntary.

Repricing

This is when the interest rate on a loan changes. This could be due to the amortization scheme, the remaining interest rate fixed period, or when the client rating changes. Repricing could be both scheduled and unscheduled. Unscheduled ones could happen due to movement, salvaging after default, or death of one of the debtors, when the client has the possibility to reprice or to stick to the current client rate, or even voluntarily by request of the client.
in-4' plans and installment loans stand out as some of the most prevalent BNPL arrangements, yet there are numerous other models worth mentioning.

Survival Analysis

In this framework the interest is in finding the time until the event occurs. In this case, the prepayment event and its distribution which is dependent on the hazard rate. With the survival analysis it is able to obtain the probability of distribution for the duration of the loan, and in fact the time until the occurrence of a certain event can be modeled. The link between the survival function and the covariates is expressed on the principle of the proportional hazard model.

Writing the hazard rate as in the equation above allows the separation of the survival function and the covariates to analyze the relationship between them. h0(t) captures the distribution function of the failure time when the explanatory variables are equal to 0 (the base rate). It is a reflection of the prepayment rate and varies with the loan age (seasoning effect). The latter includes the effects of the explanatory variables on the hazard rate, and it incorporates loan and time specific effects. The survival function is defined as the probability to survive up to time (t), combining the results, the cause-specific density for a loan is obtained and terminated at time (t).

In this competing risk model, the analysis can be divided in two parts: the MNL part determines the cause of death and the hazard model determines the overall risk. When h0 is small relative to the effect of the covariates, then the competing risk model and the MNL are comparable. The competing risk models have difficulties in including time varying covariates in the analysis. However, the major drawback of MNL is the implicit assumption of independence between consecutive observations. In fact, they are not independent since they are observations related to the same loan.

A common problem in survival analysis is censored data also known as the missing data problem. The ideal dataset contains the start and end dates of all the facilities in the portfolio of which the lifetime is determined. If the end date is not available in the dataset it is denoted as right censored data. In survival analysis right censored data is a common problem in estimating the survivor and hazard function. This occurs when data is collected over a finite period of time and consequently the event may not be observed for all facilities. Truncation is another part of missing data of which left truncation is the most common. It occurs when the loans have been at risk before entering the study. Truncation is a condition other than the event of interest.

Classical Survival Models

Nonparametric Models:

The Kaplan-Meier (KM) estimator, which is also called the product limit estimator and the Nelson-Aalen (NA) estimator are examples of non-parametric models. The KM estimator estimates the median survival distribution function, whereas the NA estimator estimates the cumulative hazard rate function. The advantage of these estimators is that these methods take censored data into account. The KM estimator is capable of using the stratification of variables. This concept consists of splitting the loan into two or more groups based on some criteria. For example, the purpose of the loan, is namely: mortgage, education or otherwise. Plotting the KM estimator of both groups can give insight into whether one group has a higher survival probability than the other.

Parametric Models:

The most common parametric models used in survival analysis are the Exponential, Weibull, and Log-logistic distribution models. Even though the KM estimator is a very useful tool for estimating the survival function, when the data needs to be modeled in more detail, fitting a parametric model to the data is the solution. These parametric models are used to estimate the survival curves. The advantage of fitting a parametric distribution to the data is that the survival and density functions are fully specified. Using these estimates, it is easier to compute the quintiles of the different distributions and tests for differences between parameters.

The Accelerated Failure Time (AFT) Model:

This is a parametric model which states that the predicted event time can be multiplied by some constant, in order for a covariate to take effect. With this model the direct effect of the explanatory variables of the survival time is measured. This results in an easy interpretation of estimated parameters, because the parameters measure the effect of the corresponding covariate on the mean survival time. These models have two main advantages: they are very easily interpreted and are more robust to omitted covariates and less affected by the choice of probability distribution compared to the proportional hazards model.

The Proportional Hazards (PH) Model: :

The Kaplan-Meier (KM) estimator, which is also called the product limit estimator and the Nelson-Aalen (NA) estimator are examples of non-parametric models. The KM estimator estimates the median survival distribution function, whereas the NA estimator estimates the cumulative hazard rate function. The advantage of these estimators is that these methods take censored data into account. The KM estimator is capable of using the stratification of variables. This concept consists of splitting the loan into two or more groups based on some criteria. For example, the purpose of the loan, is namely: mortgage, education or otherwise. Plotting the KM estimator of both groups can give insight into whether one group has a higher survival probability than the other.

Efficiency of the Adjustments

This example demonstrates how the BNPL provider generates revenue from both interest payments and merchant fees, amounting to $125 ($100 from interest and $25 from merchant fees). This revenue model compensates for the risks and operational costs associated with offering financing to subprime consumers. For consumers, while this model offers immediate access to necessary goods, it comes with a premium in the form of higher total payment compared to the retail price, highlighting the importance of informed financial decision-making.

Machine Learning Applications

The graphs above show how models are adjusted in order to obtain better performance and how ML algorithms in general achieved a superior level of Accuracy, Precision, and Recall. To do that, a logistic regression (LR), ANNs, KNN, and a random forest are applied to predict prepayment risk.