Machine Learning
Applications in Lending

As of 2020, the financial sector debt reached $66 trillion equal to 25% of the global debt. As the world gears up to fight the pandemic, the need to precisely analyze credit risk and price...

Machine Learning Applications in Lending

As of 2020, the financial sector debt reached $66 trillion equal to 25% of the global debt. As the world gears up to fight the pandemic, the need to precisely analyze credit risk and price loans arise for banks and financial institutions. Financial institutions are now under more pressure to move towards sophisticated machine learning (ML) techniques, a sub-field of artificial intelligence (AI), to tackle credit analytics and prepayment behavior, by processing large amounts of data. In this article, we aim to provide the reader with a basic introduction of the key ML concepts and techniques, to explain how such approaches differ from the more traditional statistical analysis approach, and to illustrate this theoretical presentation behind the platform. We also aim to demonstrate how we apply ML algorithms to build a predictive loan default and prepayment models.

Executive Summary

ML is about finding a model that best fits the training data and the test data following some optimization. This requires an outcome variable, which we would like to predict. A big data set is given to every ML algorithm in order for it to train itself. Vector ML Platform applies ML techniques to predict the credit risk, prepayment, and future cashflow of a loan portfolio.

Supervised Learning Models

Decision Tree

The decision tree methodology is one of the credit classification techniques used by the platform. It’s organized in tree-like graphs, with decision branches, and with possible results as leaves. This method is based on maximizing the performance measure chosen, which is related to the target credit or prepayment variables. Even though it is easy to understand its output, the decision tree approach has a somewhat restrictive capacity when generalizing the results and when handling a large number of variables. We used loan information provided by Lending Club data set to predict both credit default risk and prepayment behavior using the decision tree learning scheme and logistic regression methods.

We found that the decision tree technique has the benefit of being easy to understand and implement. It generated good results with an overall default precision of 87%, while the logistic regression had an overall default precision of 83%.

Ensemble Methods and Random Forest

Random forests are an ensemble learning method for classification, regression, and other tasks that operate by constructing a multitude of decision trees during the training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees. Bagging, boosting, and random forests are the most common ensemble methods. Random forests, as its name indicates, is based on the decision tree method, as mentioned above. Such algorithms randomly produce multiple trees, which are then combined to give a single prediction. Combining a large number of trees improves the prediction accuracy but often makes the interpretation of results more challenging. While the relevance of each explanatory variable can be assessed, statistical inference validation is far less efficient for random forests when compared with single decision tree methods.

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANNs) are learning techniques developed to imitate the way in which the human brain behaves. ANNs can modify their internal parameters and adapt themselves in order to reach certain results. ANNs are composed of neurons (nodes) which are interconnected in a pre-specified way, depending on the type of neural network. These connections represent the percentage associated with the intensity and the amount of information used in the learning process. Given their lack of transparency in model building, ANNs are considered to be “black-boxes”, but after the proper architecture is chosen, they nevertheless provide good results and have a high generalization power, if the learning process stops in time. Multilayer Perceptron (MLP) which uses the “back-propagation” learning algorithm, is one of the most common types of neural networks used in credit risk and prepayment modeling. MLP contains an input layer, one or more hidden layers, and an output layer. MLP operates in a feed-forward manner, where each neuron from a layer is connected to the neuron from the next layer. The outputs from the hidden and output layers are generated as a weighted sum of the bias and the weighted inputs using transfer functions. The most common types of transfer functions are sigmoid, hyperbolic tangent and linear. The weights from the hidden and output layers are initialized randomly and afterwards iteratively adjusted to minimize the error function. We used loan data provided by Lending Club to develop a credit decision tool based on neural networks. Compared to other traditional methods, ANNs proved to have high predictive power, generating good results with reduced evaluation time and costs. The results indicated the high capacity of the model in discriminating between “good” and “bad” risky loans. Compared with other traditional algorithms, the overall default precision using ANNs was 89%.