Credit scoring – Art or science?

30 July 2020

Financial services companies offering credits need to assess the risk they are taking when accepting a credit.

This mainly consists of determining the probability that the borrower will not repay the credit, and the amount of money that will be lost in that case. Usually this risk is expressed by respectively the Probability of Default (PD) and the Recovery Rate or Loss Given Default (LGD).

For both the PD and LGD parameters (or often PD × LGD is also used) financial service companies need to set thresholds, i.e. up to which percentages are they willing to accept the credit. This depends on the aggressiveness and business strategy of the financial services company. A higher threshold in PD and/or LGD, means the institution also has to foresee more financial buffers because of the higher chance of losing money. Of course most institutions will not just have 1 threshold, but multiple thresholds (different thresholds per product and customer segment), in order to have a more fine-grained business strategy.

On the other side, there is the “art” of determining the PD and LGD in the best possible way. Both are predictions of the future and no man or machine can predict the future without errors. Banks and other credit institutions therefore have complex models (using rule engines and AI models and a maximum of input data, like personal/company data, financial data, collateral data, etc.​) to best assess these percentages based on the insights obtained from historical data.

The better these models, the more money the institution can make, as there are less false positives and false negatives, i.e.:

  • When the PD/LGD is underestimated, the institution is at risk and will lose money because of too many defaulting credits
  • When the PD/LGD is overestimated, the institution will lose money because of too many missed opportunities (opportunity cost)

The PD/LGD will furthermore be helpful to price the credit or determine the right interest rate, i.e. so-called risk-based pricing. Such an adjustment of the credit price based on credit risk, allows further optimizing the ratio of financial risk the institution is taking versus financial benefits.

Thanks to the rise of new technologies and the Fintech movement, there have been a lot of evolutions on these credit risk scoring models in recent years. Especially the rise of AI and the usage of alternative data sources allow to create exciting new business opportunities, to offer loans to people (so-called unbanked and underbanked) and businesses who were refused by the traditional credit scoring systems.

A big difference remains however between the credit scoring modules for consumer loans and business loans.

Most consumer loans have already been highly automated, allowing to grant and decide upon many credits almost fully STP (straight through processed).

Business loans on the other hand have much more inherent complexity, as businesses can be very diverse and very complex (with multiple subsidiaries, complex shareholder structures, etc.). As a result, the analysis and decision processes for these loans remain highly specific and manual.

For consumer loans, financial institutions usually ask the client to provide following data:

  • Personal data, like name, civil status, number of children, address, phone number, etc.​
  • Professional data, like type of employment, employer name and address, employer sector, contract duration, etc.​
  • Financial data to get insights into all revenues and expenses of customers and their assets and liabilities
  • Information about the need for the credit, i.e. for what will the money be used
  • Information about the collaterals of the credit, i.e. get all details of the provided collaterals

Financial institutions enrich this data with other public and private data they have about the customer, like credit history (i.e. any past credits which were defaulted, the number of credits the customer already has, and the track record of reimbursement of all past credits), account transaction history (cross-bank via PSD2), etc.​

Afterwards a number of ratios like AVI (Available Income) and LTV (Loan to Value ratio) are calculated as well.

All this info is then fed into the risk scoring model, which tries to predict the PD and LGD.

These models are evolving rapidly to give better, more accurate results to a larger group of customers (i.e. not only for traditional customers, but also for smaller niche segments):

  • The usage of AI to improve the credit risk models: just like risk analysts try to identify correlations between the data provided by customers and the full data sets of historical credits which defaulted, AI tries to model these correlations. The power of AI is however that it can do this in a much more fine-grained way (also taking into account very small correlations in the model), more automated (allowing to continuously update the model based on new situations/trends/historical data) and much faster. Thanks to the large training data sets that banks have accumulated over the years and that are becoming increasingly well structured and of good quality, these AI models can be very well trained.
    However it is important to understand that AI is also no miracle solution, as it is still based on the correlations found in the historical data (used as training data sets), meaning rapidly changing trends cannot be predicted by an AI model neither. Furthermore, AI has the big disadvantage that financial institutions lose a lot of the explainability and control over the model. This makes it difficult to explain to internal employees, customers and regulators why the model comes to a specific PD/LGD value. E.g. it can be very difficult to avoid that AI discriminates based on race or sex, as just avoiding inputting these attributes is often not enough to avoid the model from discriminating.
  • The use of non-traditional data sets, i.e. where traditional models are based on the input data described above, a number of Fintechs (such as Uulala, Koyo, Lenddo, FriendlyScore, ZestFinance, CreditLadder…) have come up with more innovative ways to score loans, which are improving traditional scoring models that tend to work very poorly (due to lack of data) or very negatively for specific customer segments (like e.g. freelancers, gig-economy workers, immigrants, etc.​).

These Fintechs allow to do scoring based on new data sets, like social media data, telephone record data, shopping data, bank transaction data (collected via PSD2), etc. Based on this data, the risk scoring models try to model the behavior of the borrower and predict the credit risk associated to the person.
While this innovation is excellent news for the customer segments rejected by traditional models, they do raise some important questions, about data security, data privacy (i.e. lower income persons having to give up privacy for getting a loan), but also about the accuracy of these models, due to the lack of large historical data sets.

  • The use of APIs and tools for easier and faster valorization of underlying assets or collaterals. These tools allow to estimate asset value, asset quality(i.e. risk of asset dropping in value) and asset liquidity (how easily can asset be liquidated upon default). All kinds of platforms provide API services that allow estimating these parameters for different asset types.
    In this context, it is interesting to have a look at Capilever’s LABL (Liquid Asset Based Lending) product, which allows to do these calculations fully automatically for a Credit Lombard loan. Furthermore Capilever’s NLPT (Non-Liquid Position Tool) allows to streamline the inventorization and (re)valorization of assets of a customer (which can potentially be used as a collateral).

Also for business loans a lot of change is possible. As indicated above, the analysis and decision process for those loans is still very manual. However, we see that more Fintechs are providing innovative offerings to automate these processes for specific niche credits. A good example is invoice financing (also called invoice factoring), where the unpaid invoices are pre-financed by a credit institution. This product is very well structured and scoring can be done quite easily by analyzing the historical invoice payments of the company.
Unfortunately, a large part of the business loans is still very manual. The big challenge in the coming years will therefore be to increase their STP rate, by:

  • Feeding the input data with higher data quality, in a more structured way and faster. For example, easy integration with ERP and accounting platforms allows to get faster access to more structured company data (compared to annual reports)
  • Having more flexible models, which can cope with more diverse situations and even with unstructured data (i.e. typically via OCR and Natural Language Processing, allowing to structure this data and identify patterns).

Apart from making the risk scoring process more STP, more accurate and more tailored to different customer segments, there are 2 other aspects where financial institutions can make a difference:

  • Helping customers to improve their credit scoring themselves. Instead of just providing a scoring (often just the final score result), banks should provide tools and advice on how customers can improve their customer scoring. This can be done by improving solvability, liquidity and trustworthiness, but also by providing additional insights into their financial data and the option to provide additional collaterals. For more details on this concept, we refer to the product brochure of Capilever’s CPRA product.
  • Continuous reassessment or recalculation of the risk scoring during the life-cycle of the credit. This means a reassessment of the PD when changes occur in the personal, professional or financial situation of the borrower(s), but also a reassessment of the LGD, by reviewing the value, quality and liquidity of the collaterals.
    Of course in order to be profitable, this should be fully automated. When well implemented, it can help banks to better monitor and manage their risks (e.g. by lowering thresholds for future loans, increasing/decreasing buffers, asking customers to offer additional collaterals, notifying customers of the identified risk, etc.​). The LABL solution of Capilever (see LABL product brochure) is a very good example of this, i.e. the LGD is continuously reassessed and corrective measures are requested from the customer (as part of the credit contract).

As the above article demonstrates there are a lot of interesting innovations in the field of credit risk scoring. With the current turbulent economic times, we see a lot of new innovative risk scoring models, which are leading to much higher default rates than predicted, resulting in a number of Fintechs coming into difficult papers. Time (and historical data) will tell which models provide the best fit.