Sunteți pe pagina 1din 25

Available online at www.sciencedirect.

com

International Review of Financial Analysis 16 (2007) 471 – 495

A credit scoring model for Vietnam's retail banking market


Thi Huyen Thanh Dinh a,1 , Stefanie Kleimeier a,⁎
a
Limburg Institute of Financial Economics (LIFE), Faculty of Economics and Business Administration (FdEWB),
Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands
Available online 22 June 2007

Abstract

As banking markets in developing countries are maturing, banks face competition not only from other
domestic banks but also from sophisticated foreign banks. Given the substantial growth of consumer credit
and increased regulatory attention to risk management, the development of a well-functioning credit
assessment framework is essential. As part of such a framework, we propose a credit scoring model for
Vietnamese retail loans. First, we show how to identify those borrower characteristics that should be part of
a credit scoring model. Second, we illustrate how such a model can be calibrated to achieve the strategic
objectives of the bank. Finally, we assess the use of credit scoring models in the context of transactional
versus relationship lending.
© 2007 Elsevier Inc. All rights reserved.

JEL classification: G21; G28; O16


Keywords: Retail banking; Emerging markets; Credit scoring; Default risk

1. Introduction

Are retail loans special? Advocates of relationship lending tend to answer this question with
‘yes’. Since retail borrowers are typically individuals or entrepreneurs, information about them is
not readily available but needs to be obtained through close contact over the life of a bank–client
relationship. Relationship lending is thus based on soft, proprietary information. Lending
decisions are not only determined by exact characteristics such as the borrower's income or
collateral but to a large extent also by qualitative information such as the borrower's character,
reputation, or standing in the community. As relationships are expensive to maintain and as

⁎ Corresponding author. Tel.: +31 43 3883733; fax: +31 43 3884875.


E-mail addresses: k.dinh@delagelanden.com (T.H.T. Dinh), s.kleimeier@finance.unimaas.nl (S. Kleimeier).
1
Thi Huyen Thanh Dinh worked on this study while being a master student at the Maastricht University. Since then she
has moved to De Lage Landen International B.V., Vestdijk 51, 5611 CA, Eindhoven, The Netherlands. Tel.: +31 40
2339174; fax: +31 40 2338626.

1057-5219/$ - see front matter © 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.irfa.2007.06.001
472 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

relationship lending might not always be profit maximizing, banks have introduced transactional
lending tools where lending decisions are based on quantifiable borrower characteristics
supported for example by a credit scoring model (CSM). In 1996, already 97% of all US banks
used credit scores for credit card loan applications and 70% for small business loans. CSMs have
also spread globally to banking markets in other developed countries. By now, the most widely-
used CSMs are those of Fair Isaac and Co. Inc. who developed models for small business trade
credit in 1998, small business credit in 1999, and finally personal credit in 2001.2 In contrast, in
the retail banking markets of developing countries lending practices tend to be poor and banks,
i.e. small banks or micro-financiers lack the necessary data on borrower characteristics and their
credit histories to design reliable CSMs. It is therefore not surprising that only a few of the largest
banks in developing countries use CSMs. However, as banking markets in developing countries
are maturing, banks face competition not only from other domestic banks but also from
sophisticated foreign banks. Given the substantial growth of retail credit and increased regulatory
attention to risk management, the development of a well-functioning credit assessment frame-
work is essential.
In essence, a CSM provides an estimate of a borrower's credit risk – i.e. the likelihood that the
borrower will repay the loan as promised – based on a number of quantifiable borrower
characteristics. Given its widespread use in developed countries, the benefits of credit scoring as
well as the methodological issues surrounding credit scoring are well established in the academic
literature.3 Evidence regarding CSMs for retail banking markets in developing countries,
however, is extremely limited as only two studies exist: Viganó (1993) for Burkina Faso and
Schreiner (2004) for Bolivia. Both studies analyze small business loans. In contrast to business
loans in developed countries, reliable up-to-date financial data is often lacking and the boundary
between the private and business property of an entrepreneur is often vague. Thus, CSMs for
these loans need to rely – just like CSMs for consumer loans – on the personal information about
the entrepreneur.4 Schreiner's (2004) study is practitioner-oriented and discusses the benefits of
credit scoring as well as its implementation in general. Viganó (1993) specifically addresses the
problem of selecting the relevant borrower characteristics for the CSM. Despite being considered
“the best scorecard for microfinance in the literature” (Schreiner, 2004), Viganó's (1993) analysis
uses very limited samples containing 31 to 100 loans. In contrast, our study is based on the
complete retail loan portfolio of a Vietnamese commercial bank. For our CSM, we can utilize data
on more than 56,000 loans including in addition to small business loans also consumer loans,
mortgages and credit card loans. Our study is thus the first to provide a CSM that is applicable to
the overall retail lending activities of a bank in a developing country.
In particular, we identify which borrower characteristics a bank needs to collect and how these
can be combined into a scoring model. Though we start with a similar set of borrower
characteristics as many developed-country retail CSMs, two of our four most important
characteristics are unique to Vietnam – gender and loan duration – while commonly included
characteristics related to income and employment are not part of our CSM. In this sense we cannot
fully agree with Allen, DeLong, and Saunders (2004) who conclude for small business loans that

2
See Mester (1997) and Allen et al. (2004).
3
See for example Mester (1997) and Blöchlinger and Leippold (2006) for the former and Altman (1968), Hand and
Henley (1997), Altman and Saunders (1998) and Thomas (2000) for the latter literature.
4
In this sense, the small business loans considered in this study are different from those discussed by Altman and
Narayanan (1997) and as summarized in Table 2 of Allen et al. (2004). Even though these studies consider business loans
to developed as well as developing countries, they focus on the prediction of bankruptcy based on financial data of the
business alone.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 473

“what is striking is not so much the models' differences across countries of diverse size and in
various stages of development but rather their similarities.” Due to the detailed nature and large
size of our sample, we can furthermore illustrate how CSMs can be calibrated to achieve the
bank's strategic objectives. Though this issue has been investigated for banks in developed
countries, our study is the first to analyze this for a developing-country bank. Firstly, we find that
our proposed CSM leads to an increase in profits, indicating that our bank's non-performing loan
target is adequately selected. Secondly, we provide a simple rule for risk-based loan pricing which
leads to the same profit margin as the currently applied single interest rate for all borrowers.
Thirdly and finally, we present credit scoring as part of a lending-decision-making process that is
built on the principles of transactional lending but leaves room for relationship lending. Here we
propose a more specific CSM with two cut-off points. Such calibration allows banks to define
those loans that are accepted or rejected on the CSM alone as well as those which are further
examined by the credit officer. Overall, the bank benefits from reduced cost of loan assessment
while optimally using its relationship lending tools, i.e. its loan officer's knowledge about the
borrower.
The remainder of the paper is structured as follows: Section 2 presents a general CSM as
currently applied for retail credit and illustrates the modeling steps and decisions that have to be
taken. In Section 3 a model is estimated using the retail loan population of a Vietnamese
commercial bank. Specific applications of the model regarding bank profitability, risk-based loan
pricing, and relationship lending are explored in Section 4. Section 5 concludes.

2. A credit scoring methodology for retail loans

CSMs are commonly structured along the lines of Altman's (1968) Z-score model.5 First, the
CSM must be developed and estimated. Typically, a CSM uses historical loan and borrower data
to identify which borrower characteristics are best able to distinguish between defaulted and non-
defaulted loans. During this developmental stage, decisions about the model's estimation method
have to be made and all potentially relevant borrower characteristics have to be identified and
coded. Finally, the relevant borrower characteristics have to be selected and their impact on
defaults established — in other words, the model has to be estimated. Now the CSM can be
applied to new loan applications for which the probability of default (PD) is not known. Based on
the estimated CSM, a credit score can be calculated for each new loan applicant where a higher
score indicates better expected performance of the borrower and thus a lower PD. This score must
be compared to the CSM's cut-off rate to determine whether the loan application is accepted,
rejected or requires further assessment. Thus, as part of the CSM's developmental stage,
calibration is also necessary such that an optimal cut-off rate, which is in line with the lender's
objectives is determined.

2.1. The estimation method

The development of a CSM starts with the decision about the basic form of the model, i.e. its
estimation method via decision trees, linear probability models, logit or probit regression models, or

5
For an overview of the bankruptcy prediction literature since Altman's original study see Byström, Worasinchai, and
Chongsithipol (2005) or Philosophov and Philosophov (2002). The former paper is also the only one to investigate the
issue for a developing country. See Laitinen (1999) for a discussion of the related literature on the prediction of credit risk
ratings.
474
Table 1
Variables commonly used in retail credit scoring models
Commonly used variables in Variables included in the credit scoring model
industrialized countries (Crook et al., 1992)
Crook et al. (1992) — UK Schreiner (2004) — Bolivia Viganó (1993) — Burkina Faso

T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495


Postcode Postcode (1) Date of disbursement Customer's personal characteristics
(age, sex, religion, martial status, education,
employment sector and place, etc)
Age Employment status (2) Amount disbursed
Number of children Years at bank (3) Type of guarantee
Number of other dependants Current account (4) Branch Data on the enterprise (type, professional skills,
number of employees, productivity, profitability, etc)
Whether an applicants has a home phone Spouse's income (5) Loan officer
Spouse's income Residential status (6) Gender of the borrower
Employment status Phone (7) Sector of the firm Profitability (main and secondary revenue, revenue stability)
Employment category Years at present employment (8) Number of spells or arrears
Years at present employment Deposit account (9) Length of the longest Amount and composition of assets
spell of arrears (total assets including money and deposits)
Income Value of home (10)
Residential status Outgoings (11) Financial situation
(initial and current amount of loans received,
defaults, loans granted)
Years at present address Number of children (12)
Estimated value of home Applicant’s income (NI)
Mortgage balance outstanding Mortgage balance outstanding (NI) Investment plans
(presence of investment plan, other sources of finance)
Years at bank Charge card (NI)
Whether a current account is held
Whether a deposit account is held Customer's relationship with bank
(past loans with bank, savings account with bank, etc)
Whether a loan account is held
Whether a check guarantee card is held
Whether a major credit card is held Bank's control of credit risk
(loan destination, disbursement form, method of repayment,
loan amount and maturity, collateral,
contractual conditions on interest rate, etc)
Whether a charge card is held
Whether a store card is held
Whether a building society card is held
Value of outgoings
When available, a rank indicating the importance of the variable in the credit scoring model is given in brackets. (NI) indicates that a variable was considered but finally not included.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 475

multiple discriminant analyses. Based on findings by Boyle, Crook, Hamilton, and Thomas (1992),
Desai, Crook, and Overstreet (1996, 1997), Henley (1995), Srinivasan and Kim (1987), and Yobas,
Crook, and Ross (2000) as reported by Thomas (2000), we opt for the logistic regression method.
Thus, we first collect a sample of existing loans and observe their borrower characteristics and the
ex-post default status. A borrower's ex-ante PD is unobservable. The logistic regression technique
overcomes this problem by directly estimating this probability and has therefore been the
methodology of choice for retail credits. This technique assumes the existence of a continuous
variable Zj which is defined as the probability that loan-applicant j defaults and can be modeled as a
linear function of a set of variables x which describe the applicant:

Zj ¼ W Vx ¼ w1 xj1 þ w2 xj2 þ N þ wk xjk ð1Þ

where wk is the coefficient of the kth variable and xjk is the value of variable k for applicant j. Zj is
known as Z-Score of the jth applicant. Zj is ex-ante unobservable and default can only be defined ex-
post as a 0–1 dummy. Eq. (2) thus derives the default probability π using an iterative maximum
likelihood estimation method. Here, larger values of π reflect a higher PD.
1
pj ¼ V ð2Þ
1 þ eðW xÞ

2.2. Variable identification

Next, a choice has to be made among the variables x that should initially be considered for Eqs.
(1) and (2). There is no overall consensus regarding the number or type of initial variables. Some
CSMs such as First Data Resources' model for credit cards lenders (Mester, 1997) or Caisse
Nationale de Crédit Agricole du Burkina Faso's model for rural microfinance (Viganó, 1993) start
with as many as 50 variables. Table 1 presents a list of the more commonly-used variables for
retail CSMs. In CSMs for corporate loans, the variables are relatively similar across countries and
limited to financial statement data (Allen et al., 2004). For retail loans, direct measures of the
financial strength of the borrower such as income or value of the home are included. Given the
limited availability of such proxies for individuals, proxies that measure the financial strength of a
borrower indirectly (education, household size, years at employer, or postal code) also need to be
included. There seem to be only a few variables that are typical for developing countries such as
gender, religion, branch, and loan officer. This can be explained by the fact that banks in most
developed countries are prohibited by law to use variables such as gender, whereas banks in
developing countries are not. Branch and loan officer variables can reflect branch policy, bonuses,
experience, or training levels. Their inclusion in the CSM for Bolivia might indicate that
differences across branches and loan officers are more pronounced in less-developed banking
markets. Except for these few variables, however, it seems that a retail CSM for a developing
country can generally start with the same set of variables as a retail CSM for a developed country.
Note that already at this early stage, a certain selection of variables has to be made. Even banks
that do not use a CSM are collecting information from loan applicants during the lending-decision
making process. This information can serve as a starting point. Practical experience will however
show that not all variables are fully available. Values may be structurally missing (e.g. question
which are asked conditionally on the responses to previous questions). Alternatively, missing
values might be due to a deficient information system in some branches or due to customers who
themselves were not willing to completely fill in the application forms. Excluding all variables
476 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

with missing values might substantially reduce sample size and lead to a loss of valuable
information whereas including variables with only a few valid observations might lead to
unreliable results. In order to balance these two problems, we first identify those variables which
show a substantial number missing values. Second, we rely on the loan officers' expert
knowledge and feeling for the data and characteristics to identify which of these variables can and
which cannot be excluded from the model (see Henley & Hand, 1997).

2.3. Variable coding

After the choice of the initial set of independent variables has been made, qualitative as well as
quantitative variables need to be coded. The coding of quantitative variables is required when the
relationship between the variable and default is not linear. As Thomas (2000) argues, instead of
trying to map such a relationship as a straight line, borrowers can be grouped into a number of
categories such that a categorical variable is created. The later approach is more commonly used
in credit scoring mainly because it can also be applied to qualitative variables and we thus adopt
this approach for all our variables. To complete the coding of the variables, a value has to be
attached to each category. Boyle et al. (1992) show that variables can be coded based on the
distribution of defaulted and non-defaulted loans in the sample. If a variable has m categories, let
gi be the number of good (non-defaulted) loans who belong to the ith category and bi the number
of bad (defaulted) loans that belong to the ith category. G and B are the total number of good and
bad loans in the whole sample, respectively, such that
X
m X
m
G¼ gi and B¼ bi ð3Þ
i¼1 i¼1

Instead of using a simple coding rule6, estimates of probability odds of the good and bad
loans in the ith category are commonly used. We follow Crook et al. (1992) and code our
variables as7
lnðgi =bi Þ þ lnðB=GÞ ð4Þ

For qualitative variables for which the number of possible categories is very large, coding all
categories becomes infeasible. For such cases, we follow Thomas' (2000) and Boyle et al.'s
(1992) recommendation and aggregate values of similar PD measured as bi / (gi + bi).

2.4. Variable selection and estimation of the CSM

Once the variables have been coded, Eq. (2) can be estimated. As is evident from retail CSMs
in developed countries, the model can initially contain a large number of variables. Whereas this
might be statistically feasible for large samples, there are practical considerations. Too many
questions in an application form deter loan applicants, who will not answer all questions or apply
for a loan elsewhere. Efficiency and applicability thus stipulate that the number of variables is
reduced. It is furthermore critical to determine not only how many but also which variables to

6
A simple rule assigns, for example, the value of 1 to the category with the lowest PD, the value of 2 to the category
with the second highest PD, etc.
7
In our sample ln(B/G) is constant. In practice, however, when banks are updating their CSM at regular intervals, this
value will change with every update.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 477

incorporate in the model. We therefore apply a forward as well as a backward stepwise method
which sequentially adds or withdraws variables to maximize the model's predictive accuracy
(see Henley & Hand, 1997). Now, the final version of the CSM can be determined and the
coefficients wj of Eq. (2) can be estimated.

2.5. Calibration

Finally, the CSM should be tested for its predictive accuracy. This is best done out-of-sample
as the model predicts over-accurately in-sample. First, the PD for each loan in the calibration-
sample is estimated. This PD is then compared to a cut-off value to establish whether the loan
applicant will be a good (non-defaulting) or bad (defaulting) borrower. Initially, a cut-off value of
50% can be chosen such that an applicant who's estimated PD is greater (smaller) than 50% will
be classified as a bad (good) loan. This classification is then compared to the observed event of
default to establish the accuracy of the model. A classification table as shown in Table 2 is
commonly employed at this stage. Gg represents the number of correctly classified good loans
whereas Gb represents the number of good loans that are incorrectly classified as bad loans.
Similarly, Bb represents the number of correctly classified bad loans whereas Bg represents the
number of bad loans that are incorrectly classified as good loans. The percentage of correctly
classified (PCC) loans serves as an accuracy measure. The percentage of correctly classified good
loans (PCCgood) is defined as the proportion of correctly classified good loans to the total number
of observed good loans. Correspondingly, the percentage of correctly classified bad loans
(PCCbad) is defined as proportion of correctly classified bad loans to the total number of observed
bad loans. Finally, the percentage of correctly classified total loans (PCCtotal) is defined as the
number of correctly classified loans relative to the total number of loans. Though simple to use,
PCC may not always be an appropriate accuracy measure. It implicitly assumes that the costs of
misclassification of bad and good loans are equal. For banks, one classification error may,
however, be much more expensive than another. Another, often violated assumption of PCC is
that the class distribution is constant over time and relatively balanced (Provost, Fawcett, &
Kohavi, 1998). Baesens et al. (2003) use two additional accuracy measures called sensitivity
(SENS) and specificity (SPEC). In contrast to PCCbad, which relates the predicted bad loans to the
total number of observed bad loans, SPEC is defined as proportion of correctly classified bad
loans to the total number of predicted bad loans. Correspondingly, SENS is defined as number
classified good loans relative to the total number of predicted good loans. Banks might want to

Table 2
Predictive accuracy of credit scoring models
Observation Prediction PCC
Non-default Default
Non-default Gg Gb PCCgood = Gg / (Gg + Gb)
Default Bg Bb PCCbad = Bb / (Bb + Bg)
PCCtotal = (Gg + Bb) / (Gg + Gb + Bb + Bg)
SENS Gg / (Gg + Bg)
SPEC Bb / (Bb + Gb)
For the abbreviations Gg, Gb, Bb and Bg the capital letter indicates the actual nature of the loan whereas the subscript
indicates the predicted nature. Based on these four measures, the accuracy of a credit scoring model can be assessed by the
percentage of correctly classified loans (PCC), sensitivity (SENS) and specificity (SPEC).
478 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

minimize both Bg and Gb at the same time. However, reducing Bg comes at the expense of
increasing Gb, and vice versa. Thus banks should take into account its differences in cost arising
from misclassified good versus bad loans. For most banks the cost of Bg will be higher than the
cost Gb and the CSM can be calibrated based on SENS. In Section 4.1. we introduce a more
advanced method that takes these two costs explicitly into account.

3. An application to the Vietnamese retail lending market

In 1987, Vietnam started its transformation to a market economy. Part of this process is the
replacement of the monopoly of state-owned banks by a two-level banking system consisting of a
national central bank on one level and state-owned as well as commercial banks on another level.8
Projects to modernize the inter-bank market, to create an international accounting system, and to
allow outside audits of major Vietnamese banks are ongoing. However, the banking system
continues to suffer from lack of capital, inadequate provisions for possible loan losses, low
profitability, inexperience in capital markets, and the slow pace of institutional reform. With
respect to risk assessment and management, there are numerous difficulties including a lack of
transparency in non-performing loan disclosure. There is for example no uniform definition of a
non-performing loan. It is thus not surprising that non-performing loan rates range from 3% based
on banks' official financial reports to 35–70% based on figures by the World Bank or IMF.9 In
order to improve risk management in light of Basel II, Vietnam's central bank has been reviewing
its risk management regulations. As part of a broader strategy, which also addresses the banks'
business strategy, assets and liability management, and internal audit, all state-owned commercial
banks and joint-stock commercial banks have been asked to develop a comprehensive credit
manual which takes international practices in risk management into account. The development of
a proper CSM is an important building block within this credit manual.
To illustrate the current state of credit assessment in Vietnam, Table 3 summarizes the most
advanced credit scoring system as it is currently used by one of Vietnam's commercial banks. This
bank scores applicants based on 14 variables in a two step procedure. In a first assessment round, a
loan applicant is evaluated based on the nine criteria listed in Panel A. If her score is higher than a
certain threshold, she will be evaluated again based on the criteria listed in Panel B. Otherwise, she is
rejected directly without any further consideration. Based on the total score from both rounds, the
loan applicant will be rated as shown in Panel C. Note that the variables used to score applicants in the
first round overlap with those listed in Table 1. This finding enforces our earlier conclusion that
despite the difference in economic development between Vietnam and industrialized countries, the
same factors are relevant for the lending decision. The variables used in the second round measure
explicitly the customer's relationship with the bank and thus highlight the importance of relationship-
lending aspects in the decision process. Furthermore, the credit assessment system presented in Table
3 is a qualitative method where scores and weights of each variable are not the result of a statistical
approach but based on experience and judgment of the credit officer. More importantly, with this
system, the bank cannot estimate a PD and therefore cannot exploit this estimate in improving its risk
management. Given the high levels of non-performing loans, the development of a more

8
In 2005 when the data for this study was collected, this second level of the Vietnamese banking system contained five
state-owned commercial banks, one social policy bank, 31 foreign bank branches, 40 foreign credit institution
representative offices, five joint-venture commercial banks, 36 domestic joint-stock commercial banks, seven finance
companies, and the Central People's Credit Fund System with 23 branches and 888 local credit funds.
9
See Loi (2006).
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 479

Table 3
An example of credit assessment in Vietnam
Panel A: Variables considered in the first round of credit assessment
Variable Categories
Age 18–25, 26–40, 41–60, N 60 (years)
Education post graduate, graduate, high school, less than high school
Occupation professional, secretary, businessman, pensioner
Total time in employment b 0.5, 0.5–1, 1–5, N 5 (years)
Time in current job b 0.5, 0.5–1, 1–5, N 5 (years)
Residential status owns home, rents, lives with parents, other
Number of dependents 0, 1–3, 3–5, N 5 (people)
Applicant's annual income b 12, 12–36, 36–120, N 120 (million VND)
Family's annual income b 24, 24–72, 72–240, N 240 (million VND)

Panel B: Variables considered in the second round of credit assessment


Variable Categories
Performance history with bank new customer, never delayed, payment delay less than 30 days, payment delay more
(short-term) than 30 days
Performance history with bank new customer, never delayed, delay during 2 recent years, delay earlier than 2 recent
(long-term) years
Total outstanding loan value b 100, 100–500, 500–1000,N 1000 (million VND)
Other services used savings account, credit card, savings account and credit card, none
Average balance in savings account b 20, 20–100, 100–500,N 500 (million VND)
during previous year

Panel C: Loan decision


Applicant's rating Score Loan decision
Aaa ≥ 400 Lend as much as requested by borrower
Aa 351–400 Lend as much as requested by borrower
a 301–350 Lend as much as requested by borrower
Bbb 251–300 Loan amount depends on the type of collateral
Bb 201–250 Loan amount depend on the type of collateral with assessment
b 151–200 Loan application requires further assessment
Ccc 101–150 Reject loan application
Cc 51–100 Reject loan application
c 0–50 Reject loan application
d 0 Reject loan application

sophisticated CSM seems indeed necessary. In the remainder of this section, we develop such a
model based on the retail loan portfolio of another of Vietnam's commercial banks.

3.1. Sample selection and variable identification

To develop a CSM, all retail loans signed between 1992 and 2005 were extracted from
the database of one of Vietnam's commercial banks. This loan population contains still
outstanding as well as repaid mortgages, consumer loans, credit card loans or business loans to
borrowers from all over Vietnam. The bank classifies loans with more than 90 days of payment
delay or at least three consecutive payment delays as defaulted. Defaulted loans account for
4.9% in the total population. Currently, the bank records more than 30 loan characteristics.
480 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

Table 4
Descriptive statistics of the Vietnamese retail loan sample
Panel A: Variables initially considered for the Vietnamese retail credit scoring model
Variable Categories and their default frequencies
Income 0.5 to 1.5 1.6 to 3.5 3.6 to 8.0 more other
(in million VND than 8.0
per month)
7.93% 1.51% 0.29% 0.13% 4.07%
Education post university college high school non- other
graduate graduate graduate graduate high school
graduate
0.30% 1.23% 9.02% 2.40% 3.96% 1.48%
Occupation farmer, entrepreneur, doctor, police, other
housewife, farm owner, professional, military,
student consultant / broker researcher, journalist
lawyer,
teacher
3.37% 0.23% 1.90% 6.07% 0.85%
Employer type self-employed, joint-venture, joint-stock, other
entrepreneur foreign state-owned
0.22% 0.22% 1.06% 3.98%
Time with employer 0 to 2 3 to 5 6 to 10 11 to 20 21 to 35
(in years)
2.31% 2.09% 2.72% 4.44% 3.37%
Age (in years) 18 to 24 25 to 35 36 to 45 46 to 64 more than 65
1.98% 2.94% 3.27% 2.91% 4.66%
Gender male female
3.80% 2.49%
Region north centre south except Ho Chi
Ho Chi Minh Minh City
City
0.32% 26.16% 2.14% 0.20%
Time at present address 0 to 2 3 to 5 6 to 10 11 to 20 21 to 60 other
(in years)
0.71% 0.68% 1.35% 2.95% 4.86% 34.89%
Residential status home owner, home owner, living with rental other
home not used home used parents
as collateral as collateral
3.36% 0.64% 2.94% 2.36% 0.71%
Marital status married single widowed, other
divorced
3.40% 2.19% 2.27% 5.75%
Number of 0 1 2 3 more than 3
dependants
1.06% 3.06% 2.46% 4.33% 6.69%
Home phone yes no
0.92% 8.53%
Mobile phone yes no
0.40% 4.41%
Loan purpose business house collateralized general credit card
credit
5.05% 0.95% 11.96% 1.29% 0.40%
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 481

Table 4 (continued)
Panel A: Variables initially considered for the Vietnamese retail credit scoring model
Variable Categories and their default frequencies
Collateral type real estate mobile asset fixed asset, no collateral
machine
0.41% 7.13% 2.86% 7.30%
Collateral value 0 to 50 51 to 100 101 to 500 501 to more than 1000
(in million VND) 1000
5.57% 4.62% 0.68% 0.24% 0.20%
Loan duration less 13 to 24 25 to 36 37 to 48 more than 48
(in months) than 13
2.37% 1.59% 3.12% 7.35% 2.16%
Time with bank less 13 to 24 25 to 36 more than 36
(in years) than 13
2.38% 5.62% 4.30% 0.95%
Number of loans 1 2 3 more than 3
7.94% 1.49% 0.74% 0.15%
Current account yes no
1.34% 4.22%
Savings account yes no
0.11% 3.54%

Panel B: Loan types


Loan purpose Number of loans Loan size (in millions of VND)
Average Standard deviation
Business loan 19,956 127.3 385.0
Mortgage 6759 243.7 98.9
Collateralized loan 3673 54.2 522.1
General credit 21,798 26.6 34.3
Credit card loan 3851 158.8 208.9

Panel C: Differences in quantitative variables between defaulted and non-defaulted loans


Average characteristic Defaulted loans Non-defaulted loans
Income (in million VND per month) 1.85 8.36
Time with employer (in years) 13.38 11.93
Age (in years) 46.52 43.25
Time at present address (in years) 28.68 18.95
Number of dependants 3.51 2.88
Collateral value (in million VND) 38.24 330.92
Loan duration (in months) 61.91 28.22
Time with bank (in years) 15.15 10.60
Number of loans 1.26 3.66
In Panel A, for each variable the categories are shown in the top rows followed by the default probabilities in the bottom
row in italics. In Panel B, the number of loans describes our full sample of 56,037 loans which can be either repaid or can
still be outstanding. The loan size indicators are based on outstanding loans only.

There are, however, many missing values. Using all 30 variables is not feasible as less than
10% of the applicants have full information for all characteristics. In order to obtain a
sufficiently large sample of loans with complete information we rely on the expert knowledge
of the loan officer and identify 22 relevant variables as listed in Panel A of Table 4 with
482 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

relatively few missing values. 10 This leads to a sample of 56,037 loans. This sample contains
3.3% defaulted loans which is only slightly lower than 4.9% in the total loan population. 11
This could imply that a borrower who sufficiently fills in the loan application form has a lower
PD than someone who does not properly complete the form. This is also consistent with one of
strategies used in practice for coping with missing values as stated by Henley (1995): “a refusal
to answer a particular question may be indicative of greater risk”. Our sample of 56,037 loans
is thus based on 22 variables where the criteria for selecting these variables are based on the
expert knowledge. These criteria can overrule the requirement of no missing values and our
sample thus contains the category ‘other’ which covers missing values.
As can be seen in Panel A of Table 4, the bank distinguishes five borrower groups based on their
loans' purpose: Customers borrowing money to finance their business (business); customers
borrowing money for purchase or maintenance of their house (house); customers borrowing
money to finance mobile assets such as the purchase of a car or motorbike where the asset serves as
collateral (collateralized); customers borrowing money for living expenses or consumption
without collateral (general credit); and customers borrowing money via the bank's credit card
(credit card). Thus, our sample includes consumer as well as business loans. In Vietnam in general
and in our bank in particular, these latter loans are used to finance relatively small and private
businesses. Due to the lack of reliable up-to-date financial data on these borrowers and due to the
fact that the boundary between the private and business property of an entrepreneur is often vague,
these loans can only be assessed based on the personal information of the entrepreneur. In terms of
loan numbers, Panel B of Table 4 shows that business loans and general credits are the most
frequently occurring loan types which account for 36% and 39% of all loans, respectively. The
other three loan types account for only 5% to 10% of all loans, each. In terms of value, however,
housing loans constitute a larger share of the bank's loan portfolio due to their large average size of
243.7 million Vietnamese Dong (VND). Furthermore, the variations in loan size within one loan
type category can be substantial. Whereas the relatively small standard deviation for mortgages
and general credits indicates that these are more homogeneous loans with respect to size,
collateralized loans and business loans are more heterogeneous. Furthermore, default varies by
loan purpose ranging from as little as 0.40% for credit card loans to 11.96% for collateralized loans
(see Panel A). The low default rate for credit card loans is driven by credit rationing as the bank
reserves these loans for their best customers. The high default rate of collateralized loans reflects
that fact that collateral is a signal for high risk and thus correlated with the risk characteristics of the
customer such as age or income. Comparing defaulted and non-defaulted loans based on Panel C of
Table 4 reveals further differences. Most strikingly, defaulted borrowers have lower income,
longer-term loans, and less collateral. They have been the bank's customer for a longer time but
received fewer loans during this period. The negative correlation between income and loan
duration is common in retail lending as a loan of a given size can only be repaid over a longer
period of time when the borrower's income is small. The difference in collateral value can be

10
We thus exclude the following characteristics: Credit line, living expense, frequency of principal payment, frequency
of interest payment, outstanding amount in savings account, outstanding amount in debit account, sub-income and
number of days in arrear.
11
The population PD of 4.9% is in line with the PD reported on banks’ financial statements but rather far below the
alternative figures of 35% to 70% reported by the World Bank. For the purpose of this study we assume that the data
provided to us by the bank accurately records all defaults. As long as any potential mis-reporting is not structurally
related to borrower characteristics (i.e., the variables included in our CSM), this assumption is acceptable as it does not
affect the estimation of the model but would only lead to an over-statement of the predictive accuracy of the model.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 483

explained by the fact that defaults for mortgages, which have very valuable collateral, is rare
whereas defaults for collateralized loans, where less valuable assets such as machines are posted as
collateral, are much more frequent. When discussing the coding of the variables in Section 3.2. we
will investigate in more detail the relationship between borrower characteristics and default.
Finally, note that because the sample contains only information of loan applicants who were
accepted, it is not representative of the population of future applicants. We also cannot perceive
how those applicants who were rejected would have performed if they had been accepted. A
solution is the out-of-sample calibration of the model described in the previous section. We
therefore divide the 56,037 loans into two sub-samples. The initial sample contains 30,994 loans
of which 1026 (3.3%) are in default and the hold-out sample contains 25,043 applicants of which
798 (3.2%) are in default.

3.2. Variable coding, selection and model estimation

To estimate the CSM, the forward stepwise method is used to select among the 22 initial
variables. This method starts with a model without any independent variables and sequentially
adds variables. At each step, the variable that leads to the greatest improvement in predictive
accuracy – in terms of the highest score statistic conditional upon a significance level of less than
5% – is added. The process continues until no variable with a significance level of less than 5%
can be found. Based on the forward stepwise method, 16 of the initial 22 variables are included in
the CSM. To ensure that the selected variables are the most powerful predictors, the backward
stepwise method is applied as well. Here the starting point is a model that contains all 22 initial
variables. At each step, the method eliminates the weakest variable so that only the strongest
predictors are kept for the final CSM. As expected, the backward stepwise selects the same 16
variables as the forward stepwise method. The six remaining variables are excluded due to one of
two reasons. First, they have insignificant coefficients and do not contribute to the explanation of
the dependent variable's variance. Second, they are eliminated because of their correlation with
included variables. Together with adding the most predictive variables to the model, at every step
of the backward stepwise regression, the tolerance (1 − Ri2) of every excluded variable is
calculated. Those variables with tolerances of less than 80% (i.e. 20% or more of the variance in
the variable is explained by variation in the other variables) are considered for deletion. In the case
of multicollinearity, the inclusion of all variables leads to inferior results in the hold-out sample.
The 22 variables initially selected for inclusion in the CSM overlap with the list of commonly
used variables of Table 1. In line with our general discussion in Section 2.2., we find that our
variables are not unique for Vietnam in particular or even for developing countries in general.
Exceptions are gender and region (similar to Schreiner's (2004) branch variable) which seem to be
exclusively used in developing countries. When categorizing and coding the variables, however, we
take the specific circumstances in Vietnam into account. The categories as well as the associated PD
of all 22 variables can be found in Panel A of Table 4. Outlined below are the details regarding these
categories, the specific circumstances in Vietnam, and the reasons why we expect these borrower
characteristics, i.e. those captured by the finally selected 16 variables, to be relevant.
Regarding education we expect that better educated people have more stable, higher-income
employment and thus a lower PD. We therefore distinguish borrowers by their educational degree
ranging from post-graduate to non-high school graduate. In 2002, however, less than 0.5% of the
Vietnamese working population held a graduate or post-graduate degree and only 13% held an
undergraduate degree (GSO, 2002). The largest share of the working population thus falls into the
lower two educational categories. Our loan sample broadly reflects these demographics but
484 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

contains more highly educated borrowers: 30% of borrowers hold a post-graduate, graduate, or
undergraduate degree, 39% hold a high-school degree, and 31% have less or other education.
Default frequencies do however not consistently decline with increasing education. College
graduates, for example, have the highest default frequency of 9.02%.
Gender can no longer be included in CSMs for many industrialized countries as it is deemed
discriminatory. In contrast, Schreiner (2003) argues that fair discrimination – for example based
on the statistical default rates of men versus women – is acceptable as it is based on quantifiable
data. The alternative – subjective scoring – discriminates equally if not more. Overall, there is
ample evidence that women default less frequently on loans (Schreiner, 2004) but most of this
gender effect disappears when other risk factors that are correlated with gender are accounted
for.12 We thus initially include gender and rely on the variable selection procedure to determine
whether gender remains in the final CSM. In Vietnam there is furthermore a specific relationship
between gender and income which in turn might be predictive of default. The GSO (2004) reports
a higher income for women than for men. For wage earners, the average monthly income per
capita per month is 228 thousand VND if the head of the household is a woman compared to 139
thousand VND if the head of the household is a man. The same holds on aggregate for all income
sources but here the difference between women and men is less pronounced with 589 versus 489
thousand VND, respectively.13
Region represents the area of the country where the borrower lives. As people of similar wealth
tend to live in the same location (a suburb might attract richer residents and the resulting increase
in housing and property prices make this suburb prohibitively expensive for poorer households),
this geographic criterion can indicate a borrower's level of financial wealth. A typical proxy is an
area's postal code but in Vietnam a postal code is not part of the address. Instead, we approximate
the region with the branch where the loan is issued. According to the bank's policy, borrowers can
only obtain loans from their local branches and branch location therefore coincides with the
borrower's residential area. The numerous branches were categorized into four regions of the
north, center, south (excluding Ho Chi Minh City) of the country, and Ho Chi Minh City. Note
that given this coding, region might not only be a proxy for the borrower's wealth but also reflect
the current credit assessment ability of the different branches. In our sample, it appears that
borrowers from Ho Chi Minh City are the best borrowers and those from the center are the worst.
This can be understood in terms of the average income per person which is highest in Ho Chi
Minh City, followed by the north and south, while the center is the poorest area in Vietnam (World
Bank, 2004).
Time at present address represents the number of years that the borrower has been living at his
current address. Crook, Hamilton, and Thomas (1992) find that default risk drops with an increase
in this variable. In this sense, time at present address might be a proxy for the borrower's maturity,
stability, or risk aversion. However, this relationship does not hold in Vietnam where default rates
increase with the time at the present address. In Vietnam, people who acquire financial wealth
tend to seek better living conditions and thus often move to a new home in a better area. Thus,
changing address might be a signal that a borrower's financial wealth is high and/or improving
rapidly. Under these conditions, he is better able to repay his loan.

12
For Bolivian microfinance, Schreiner (2004) shows that women in comparison with men are less risky borrowers as
they are more likely to be traders than manufacturers and to have smaller businesses that require smaller, shorter loans.
13
Note that for this as well as all other binary variables, coding the variable based on Eq. (4) leads to a more precise
measure than simply assigning values of 0 and 1.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 485

Residential status indicates whether borrowers own their home, rent, or live with their parents.
Ex ante, the relationship between residential status and default is unclear. On the one hand,
residential status can indicate financial wealth in particular in the case of home ownership. On the
other hand, residential status can indicate increased financial pressure on the borrower's income
through insurance fees, taxes, or electricity cost. Crook et al. (1992) find that borrowers who are
most likely to default belong to the “other” category whereas borrowers living with their parents
are least likely to default. We expect that this ranking will be different for our sample since the
reasons of having a certain residential status in Vietnam are dissimilar from those in industrialized
countries. Almost 90% of all borrowers in our sample own their homes. This properly reflects the
Vietnamese population as a whole where 95% of all households own their house. House
ownership is with 98% somewhat higher in rural areas compared to 86% in urban areas (GSO,
1999). If this house is used as collateral, default rates are among the lowest whereas default rates
are highest when this house is not used as collateral. This can be explained by the importance
of owning a house in Vietnamese society, which leads to a strong aversion of many borrowers
to loose their house.
Marital status can matter if it has an effect on the responsibility, reliability, or maturity of
borrowers. In our sample, however, default rates are higher for married than for single borrowers.
This result can be due to the fact that marital status is typically related to number of dependants
which in turn reflects financial pressure on the borrower and her ability to repay a loan. Another
linkage exists between marital status and age. The Committee for Population, Family and
Children (2003) finds that in 2002, Vietnamese women marry mainly in their 20s. Among 20- to
24-year old women only 46% are married compared to 80% to 90% of older women. Thus, the
age group which shows the lowest default rates (18 to 24 years) also has the most unmarried
people. However, as 87% of borrowers are married, it is unclear ex ante how informative marital
status will be in contrast to other variables such as age or number of dependents.
Number of dependants represents the number of people that the borrower has to support. Crook
et al. (1992) separate this variable into two categories, the number of children and the number of
other dependants. In this study, we measure the total number of dependants though this variable
mostly reflects the number children. As the number of dependants increases, so does the pressure
on the borrower's income due to higher expenses such as school fees. In Vietnam as in
industrialized countries, the default rate increases steadily with the number of dependents. For
example, moving from zero to three dependants quadruples the default risk from 1.1% to 4.3%.
Note that our categories from zero to more than three dependents are reflective of the typical
household size in Vietnam. In 2002, 57% of all households had four or less members (thus three
dependants in addition to the borrower). Larger households have commonly five or six members
while very large households are with less than 10% rare in Vietnam (Committee for Population,
Family and Children, 2003).
Home phone measures whether a customer has a home phone or not and can indicate how
easily the bank can keep contact with that borrower. Thus Crook et al. (1992) find that not
having a home phone is associated with a higher default risk. We expect this relationship to be
present in Vietnam – possibly stronger than in developed countries – as the percentage of
households with a home phone was only 27% in 2004. There are however big differences
across areas: 74% of urban households have a telephone whereas only 11.5% of rural
households do. In Vietnam, home phone and default are not only linked due to the ability to
contact the borrower but also due to the fact that home phone and income are correlated. While
only 16% of households in the lowest 20% of the income distribution have a telephone, 74% of
households in the highest 20% of the income distribution have a telephone (GSO, 2004). In our
486 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

sample the percentage of borrowers with a home phone lies with 70% above the population
average.
As the first two variables that measure the borrower's banking relationship, loan purpose
describes how the funds are used and collateral type measures what type of collateral supports the
loan. Collateral type is incorporated in by Schreiner (2004) who classifies guarantees as none,
personal, multiple, or other guarantee and Viganó (1993) who includes dummies for pledges,
sales or personal guarantees. Our categorization is different due to specific features of the
Vietnamese retail banking market. Next to non-collateralized loans, we distinguish between real
estate assets, mobile assets and fixed assets. Given the state of Vietnam's legal system, banks do
not accept personal guarantees as these are difficult to enforce. In our sample there is clear relation
between loan type and collateral type, partly due to the overlapping variable definitions.
Generally speaking, requiring collateral is a signal of risk. Most defaults occur in the
collateralized or business loan class. For business loans, 97.52% are collateralized with real
estate, 1.4% with fixed assets such as machines, 0.49% with mobile assets and 0.06% with other
collateral. Only 0.53% of business loans are not collateralized. At the other end of the distribution,
credit cards do not require collateral and default rates are low. This can be due to the fact that a
credit card is still very new product, which is offered only to those borrowers who have a very
close and good relationship with the bank. The exemptions to the rule are housing loans. Though
75% are collateralized, less than 1% is in default. This might indicate borrower's risk aversion
regarding the loss of his house and is also consistent with Vietnamese culture in which people
consider their houses as a very important feature of their life. Finally, there is an overlap between
the collateral type and residential status as only borrowers who live in their own home can use this
as collateral. Note however that many borrowers, who own more than one property, use a property
that they do not live in as collateral for a (business) loan. This could again be a strong signal for
the cultural importance of owning your home. As a related proxy, collateral value reflects the
value of collateral in millions of VND. The higher its value, the higher the incentive for the
borrower, who does not want to loose her collateral, to repay.
Loan duration measures the maturity of a loan in months. Usually, this is a feature of the loan
and as such the outcome of the negotiation between borrower and bank and is thus excluded
from CSMs. However, the Vietnamese situation is different. What we measure here is the loan
duration as proposed by the borrower and not as negotiated between the bank and borrower. Thus,
this variable reflects the borrower's intention, risk aversion, or self-assessment of repayment
ability.
Time with bank measures the length of the banking relationship in years. In the context of
relationship lending, it can be assumed that the longer a customer stays with a bank, the more
the bank knows about him and the lower the default risk becomes. Whereas this is confirmed
for industrialized countries (Crook et al., 1992), it does not quite hold in Vietnam. The highest
default rate can be found for those borrowers who have been with bank for 13 to 36 years
whereas borrowers who have been with bank both shorter and longer are less likely to default.
This could reflect the fact that the Vietnamese banking market is being reformed and that credit
officers still have room for preferential credit allocation. As reform progresses and CSMs are
put in place, the effect of this variable is expected to change and should thus be updated
regularly.
Number of loans counts the number of loans a customer has received during her whole
relationship with the bank. Many borrowers have not only a sequence of historical loans but also
have more than one loan at a time. As a defaulted borrower has difficulties in receiving a new
loan, this proxy can be informative about default risk. As expected, default is least frequent for
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 487

Table 5
The estimated credit scoring model
Included variables Estimated coefficient Standard error Significance level
Time with bank − 1.774 0.121 0.0%
Gender − 1.557 0.222 1.0%
Number of loans − 0.938 0.051 1.4%
Loan duration − 0.845 0.080 3.7%
Savings account − 0.750 0.104 3.1%
Region − 0.652 0.030 13.6%
Residential status − 0.551 0.278 44.6%
Current account − 0.492 0.208 10.4%
Collateral value − 0.402 0.096 9.8%
Number of dependants − 0.356 0.096 9.9%
Time at present address − 0.285 0.054 2.5%
Marital status − 0.233 0.101 68.1%
Collateral type − 0.190 0.057 53.0%
Home phone − 0.181 0.047 3.4%
Education − 0.156 0.067 60.3%
Loan purpose − 0.125 0.054 3.3%
Constant − 3.176 0.058 4.6%

repeat borrowers: Whereas 7.94% of first-time borrowers default, only 0.15% of borrowers
with three or more loans do so. This variable thus clearly reflects how important the bank–
borrower relationship, i.e. a good credit record, is.
Current account is a binary variable that indicates whether a borrower holds a current account.
In Vietnam, the concept of a personal bank account is still unfamiliar to most of the population. In
2006, only about 6% of the population has a bank account. With 35%, this percentage is however
higher for the urban middle class and has been increasing substantially from only 12% in 2001.14
Having a bank account is mostly an indicator of modern lifestyle among young people and this
variable can thus be indicative of education or financial wealth (see also Crook et al., 1992). In our
sample, 34% of borrowers have a current account at the time of the loan request and these have
indeed a lower default risk. The high growth in current accounts might substantially change the
relationship between default and current account ownership in the future. Consequently, this
variable needs to be updated regularly if included in a CSM.15
Table 5 shows the estimated CSM and reveals that among our 16 variables time with bank is
the most important predictor, followed by gender, number of loans, and loan duration. Com-
pared to Crook et al.'s (1992) CSM for the UK, six out of their 12 predictors are incorporated in
our CSM: time with bank (ranked number 1st here versus 3rd by Crook et al.), savings account
(5th vs. 9th), region (6th vs. 1st), residential status (7th vs. 6th), current account (8th vs. 4th),
and home phone (14th vs. 7th). Compared to Crook's longer list of 24 most typical variables
there is an overlap with nine of our variables. As such, our model reflects – to some extent – the

14
Figures are obtained from two surveys conducted in 2006: (1) A survey conducted by Visa International as reported in
the Thanh Nien News on January 8, 2007. (2) TNS's VietCycle survey as reported in the Vietnam Investment Review No
795 on January 8, 2007.
15
We also consider income, occupation, employer type, time with employer, age, and the ownership of a mobile phone.
As these variables are not part of our final CSM, they are not motivated here in detail. Their categories and default
frequencies are, however, listed in Panel A of Table 5 for illustrative purposes.
488 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

Table 6
Predicted versus observed probabilities of default in the hold-out sample
Panel A: Default probabilities by loan group
Group Number of loans Probability of default (in %)
Predicted
Observed Minimum Average Maximum
1 5091 0.01 0.00 0.01 0.02
2 4956 0.08 0.03 0.06 0.12
3 5007 0.27 0.13 0.24 0.41
4 4929 1.11 0.42 0.77 1.79
5 5060 49.31 1.80 15.41 96.81
total 25,043 3.19 0.00 3.33 96.81

Panel B: Predictive accuracy with a cut-off of 0.50


Observation Prediction
Non-default Default PCC
Non-default 24,136 109 99.55% = PCCgood
Default 397 401 50.25% = PCCbad
97.98% = PCCtotal
SENS 98.38%
SPEC 78.63%

Panel C: Predictive accuracy with an optimal cut-off of 0.1995


Observation Prediction
Non-default Default PCC
Non-default 23,698 547 97.74% = PCCgood
Default 199 599 75.06% = PCCbad
97.02% = PCCtotal
SENS 99.17%
SPEC 52.27%

Panel D: A credit scoring model with two cut-offs of 0.01 and 0.21
Observation Prediction
Non-default Marginal Default
Non-default 19,023 4,681 541
Default 40 155 603
Bg/B 5.01%
Bb/B 75.56%
See Table 2 for definitions of the accuracy measures.

international norm of credit scoring. The inclusion of variables such as collateral type, loan
duration, or gender is however unique to developing countries. Particularly, gender and loan
duration are very effective predictors ranking 2nd and 4th in the CSM, respectively. First
regarding gender, recall the above discussion whether gender is truly indicative of default or
simply reflects underlying risks. For Vietnam, we have to conclude that gender helps in the
proper assessment of credit risk even when other risk factors such as loan purpose or collateral
value are taken into account. As women are coded with a higher value than men in our CSM,
the negative coefficient of gender implies that Vietnamese women default less frequently than
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 489

men. Furthermore, we argued above that gender is indicative of income, a fact which can
explain the absence of more traditional income proxies (income, occupation, employer type or
type with employer) from our model. Second, loan duration, as a measure of the borrower's
intention, risk aversion, or self-assessment of repayment ability, is unique to the Vietnamese
situation. It indicates that Vietnamese banks cannot only rely on their own assessment of the
borrower but also on the borrower herself, who appears to adequately and honestly state her
own loan capacity. In this context, time with bank and number of loans as 1st and 3rd most
important predictors reflect the borrower's relationship with the bank and thus the role that such
relationships play in Vietnam's loan market. To this extent, our CSM is consistent with the less
formalized credit assessment approach of another Vietnamese bank presented in Table 3, which
also relies heavily on relationship proxies, i.e. in its second round of credit assessment.
Regarding the accuracy of our CSM, its explanatory power increases from step to step leading
to a final adjusted R-square of 0.578. However, to properly assess the accuracy of the model, it
should be applied out-of-sample.

3.3. Model calibration and out-of-sample testing

When applied out-of-sample, we find as expected a significant difference in the predicted


PD between observed defaulted and non-defaulted borrowers. The average (median) of the
predicted PD is 1.73% (0.2%) for non-defaulted loans compared to 49.05% (51.66%) for defaulted
loans. However, the ranges of predicted probabilities are rather large and overlap for the two loan
groups. For defaulted loans PD ranges from 0.01% to 96.81% compared to 0.00% to 73.54% for
non-defaulted loans. A closer look at Panel A of Table 6 reveals the observed versus predicted PD
for different sub-samples which are generated by sorting all loans of similar predicted PD into five
groups of approximately equal size. By looking at the average estimated PD of 3.33% compared to
the observed PD of 3.19% we can see that our model slightly overestimates default. There is,
however, a pattern: The PD of the low-risk groups 1 to 3 are underestimated whereas the maximum
estimated probability is far beyond the observed PD. The average predicted PDs indicate that
default risk is rising steadily but only jumps above the 1% level for group 5. A more detailed
accuracy analysis is thus warranted using the PCC, SENS, and SPEC indicators.
Panel B of Table 6 reports these accuracy measures for an ad-hoc cut-off value of 50%.
Regarding PCC, the overall accuracy is surprisingly high with almost 99%. Looking at defaulted
versus non-defaulted loans separately reveals that this high accuracy is mainly driven by the non-
defaulted loans as PCCgood (99.55%) is much higher than PCCbad (50.25%). For industrialized
countries, Desai et al. (1996) report accuracy values of up to 86.90% for PCCtotal and 41.38% for
PCCbad. For Bolivia, Schreiner's (2004) different calibrations are able to achieve a maximum
accuracy for PCCtotal of 91%, PCCgood of 99%, or PCCbad of 71%. Overall, our results are in line
with these findings. The sensitivity measure implies a PD for the bank's loan portfolio of (100% —
SENS) = 1.62% because all applicants with a predicted PD of less than 50% will be accepted and
98.38% of these are expected to repay the loan. Our SENS accuracy is slightly higher than that
reported in other studies. For example, Baesens et al.'s (2003) survey reports sensitivity measures
of up to 96.36%. Finally, SPEC indicates that among the rejected loans (with a predicted
PD N 50%), 78.63% would have actually defaulted whereas 21.37% would have been repaid. Note
that SPEC is an abstract measure since in practice a bank cannot check whether its rejected
loan applicants would have defaulted or not. The 21.37% incorrectly rejected loans should be
interpreted as opportunity cost for the bank. In contrast, the 1.62% incorrectly accepted loans
create actual cost for the bank.
490 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

The CSM's calibration, i.e. the determination of the optimal cut-off, depends on the preferences
of the bank. Our bank – though it does not have a CSM in place but relies on subjective credit
assessment – has set a target for non-performing loans. Assuming that this target is 0.83%, we have
to calibrate the CSM to a sensitivity of 99.17% (SENS = 100% – 0.83% = 99.17%). At the first
sight, one might think that in order to achieve this target, bank should only accept customers who
have predicted PD of less than 0.83%. Such a cut-off would however lead to a SENS of 99.33% and
thus an expected PD of 0.77%. Instead, a cut-off of 19.95% as shown in Panel C of Table 6
achieves the desired result. Comparing the accuracy of this cut-off with that of the ad-hoc cut-off of
50% reveals the following dynamics: By sacrificing a small amount of good loans (PCCgood drops
from 99.55% to 97.74%), the number of incorrectly accepted bad loans increases substantially
(PCCbad increases from 50.25% to 75.06%). In terms of SENS and SPEC, the improvement by
0.79% in sensitivity is achieved at a cost of a 26.36% reduction in specificity.

4. Credit scoring and bank strategy

In the previous section, the cut-off value of 19.95% is chosen for the bank to achieve its
assumed default target of 0.83%. Is this optimal if the bank follows other strategies such as
minimizing the time spent on credit assessment or maximizing return? In order to answer this
question, we look at alternative ways to calibrate the CSM.

4.1. Profitability

A CSM's benefit can lie in the effect that the new credit assessment policy has on the bank's
profitability. Schreiner (2003) suggests measuring the change in profits in terms of opportunity
cost. In particular he focuses on the ratio of good loans lost (Gb) per bad loans avoided (Bb), which
is to some extent related to SPEC. Panel A of Fig. 1 shows how this ratio changes for different cut-
off values and Panel B calculates the effect on profit as

Dprofit ¼ cost per bad loan ⁎ Bb  benef it per good loan ⁎ Gb ð5Þ

For the cut-off of 19.95% identified as optimal in the previous section, the tradeoff is foregoing
0.91 good loans for each bad loan that is avoided. In terms of profit changes, we assume different
relationships between the cost per bad loan and benefit per good loan. Data for developing countries
is not available and we therefore take the evidence from the credit card market in industrialized
countries as a benchmark. Here, practitioners argue that it takes at least 10 good loans to cover the
cost of one bad loan (Schreiner, 2003). Under this assumption, a cut-off of 19.95% leads to an
increase in profits, which would indicate that our bank's target of 0.83% non-performing loans is
well set. Overall, however, the change in profits is relatively stable for a cut-offs of 7% to 20%
before falling more substantially. A further look at Panel B of Fig. 1 reveals that as the cost-
benefit ratio becomes smaller, so does the effect on profits. With a cost-benefit ratio of 10-to-5,
an increase in profits can only be obtained with cut-off value of 7% or higher.

4.2. Risk-based pricing

Economic theory suggests that in a competitive market, prices will equal marginal cost. If the
market price is higher, high profit margins will induce competitors to enter the market and bid
down the price. This principle implies that each loan should be priced in accordance with the cost
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 491

Fig. 1. Credit scoring and bank profitability.

of funds, origination and servicing costs, and the implicit costs associated with potential default.
While the fixed costs do not generally vary within the same product category, expected default
cost can vary across borrowers and should be calculated as PD times loss given default (LGD).
The risk-based interest rate R could thus be derived in a simple model as

R ¼ cost of funds þ overhead cost þ expected default cost þ profit margin ð6Þ

As Vietnam moves towards a more and more competitive retail banking market, borrowers will
shop around and any bank offering too high interest rates for good borrowers will loose market share.
At the same time, banks offering too low interest rates for bad borrowers will experience eroding
profits. Risk-based pricing therefore seems to be necessary if a bank wants to survive in Vietnam's
emerging banking market. Currently, however, our bank charges the same annual interest rate for all
medium- and long-term loans. Based on our CSM, we can suggest a simple pricing strategy.
492 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

Table 7
Risk-based loan pricing
Borrower class
Low risk Moderate risk High risk All borrowers
Range of predicted PD 0.00%–0.04% 0.05%–0.30% 0.31%–19.95% 0.00%–19.95%
Number of loans 7424 7557 7842 22,823
Overhead cost 1.00% 1.50% 2.00% 1.50%
Cost of funds 9.00% 9.00% 9.00% 9.00%
Profit margin 2.00% 2.00% 2.00% 2.00%
Cost of default 0.10% 0.20% 2.00%
Loan interest rate 12.10% 12.70% 15.00% 13.30%
LD = average observed PD 0.13% 0.33% 1.98% 0.83%
LGD 100.00% 100.00% 100.00% 100.00%
Cost of default = LD ⁎ LGD 0.13% 0.33% 1.98% 0.83%
Loan interest rate 12.13% 12.83% 14.98% 13.33%
The abbreviations are probability of default (PD) and loss given default (LGD).

As Table 7 shows, we split the bank's borrowers into three risk classes containing about 7500
borrowers each. The predicted PD for the low risk class ranges from 0.00% to 0.04%, for the
moderate risk class from 0.05% to 0.30%, and for the high risk class from 0.31% to 19.95%. Given
the targeted maximum of 0.83% non-performing loans, any applicant with an estimated PD of
more than 19.95% will be rejected. Based on our discussions with Vietnamese bankers, a 2.00%
profit margin and an average 1.50% overhead cost appear to be reasonable and realistic as-
sumptions. However, customers with higher risk usually create higher costs because they take
more of the credit officer's time. To be consistent with customer-specific pricing, overhead cost of
2.00%, 1.50%, and 1.00% will be allocated to the high-, moderate-, and low-risk borrowers,
respectively. The cost of funds can be approximated by the bank's deposit interest rate. In Vietnam
in 2006, the annual deposit interest rate on local currency deposits is about 9%. Finally, the
expected cost of default PD⁎LGD has to be considered. Our CSM provides estimates for PD but it
is beyond the scope of this study to estimate LGD. LGD will be at least loan-type specific since
each type of loan is associated with a different type of collateral. Default cost can be quite
significant. As a benchmark note that in 1997 risk-premia for US consumer loans ranged from as
low as 0.10% to more than 2.00% for the highest risk borrowers (Sangha, 1998). In line with this
study, we assign 2.00% default cost to the high-risk borrowers, 0.20% to the moderate-risk and
0.10% to the low-risk borrowers. As Table 7 shows, the bank could charge 12.10%, 12.70%, and
15.00% to the different risk classes. If, on the other hand, the bank decides to charge the same price
to all borrowers, it could charge an average interest rate of approximately 13.30% to earn the same
profit margin. As the second set of calculations shows, these assumptions would be consistent with
a LGD of 100% and a PD based on the realized PD of the risk class. In practice, PD could be based
on the historic PD obtained from the loan portfolio which is used when estimating the CSM.16

4.3. Transactional versus relationship lending

Should banks rely exclusively on their CSM, i.e. transactional lending, or should there be room
for relationship lending? The benefits of credit scoring are clear. Mester (1997), for example,

16
For an in-depth analysis of the interaction between risk-based pricing, profit-maximizing cut-offs, and adverse
selection problems see Blöchlinger and Leippold (2006).
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 493

reports that credit scoring has substantially reduced the length of the loan approval process from
two weeks to 12.5 h for small-business loans in the US, or from nine to three days for consumer
loans in Canada. Once a CSM is in place, these benefits can be achieved at a cost of $1.50 to $10
per loan. In order to reduce the cost, time, and effort that credit officers spend on loan assessment or
to allocate officers' time more productively, banks might thus be inclined to implement a strict
credit policy based on their CSM alone. In industrialized countries, internet banks rely on
automated loan systems which include CSMs. Mester (1997) reports that Banc One exclusively
uses credit scores for loans up to $35,000 and approves a further 30% of all loans below $1 million
only based on scores. Furthermore, a regional bank in Pennsylvania assesses all small business
loans under $35,000 based on the CSM alone. The downside, however, can be substantial.
Schreiner (2003) recalls a Bolivian example of a finance company that relied exclusively on its
CSM and went bankrupt.
Credit scoring is based on quantifiable criteria and ignores qualitative risk. Its underlying
assumption is that qualitative risk is either irrelevant or not measurable. Is this assumption
reasonable? If qualitative risks matter but are not captured indirectly via the quantitative variables
in the model, the estimated PD will be too low.17 To the extent that credit officers can discover this
qualitative risk through their relationship with the borrower and to the extent that this risk is not
already reflected in variables measuring the bank–customer relationship such as ‘time with bank'
or ‘number of loans', a combination between transactional and relationship lending is valuable.
Credit scoring can thus either precede or follow traditional loan assessment. Schreiner (2003)
argues that banks should apply their CSM only to loans that are already conditionally approved by the
credit officer. In contrast, Mester (1997) reports that banks in the US have credit officers reviewing the
CSM's decision and that many banks focus on those loan applicants with scores close to the cut-off
point. For the First National Bank of Chicago this led to a substantial change in the credit decisions for
small-business loans. About 25% of the applications rejected by the CSM were later approved and
another 25% of the applications accepted by the CSM were later rejected by the credit officer. Based on
our hold-out sample, we can only illustrate this second approach.18 In particular, we will illustrate how
our CSM can be calibrated for two cut-offs C1and C2 with 0 bC1 bC2 b 1. Such a calibration allows
banks to define loans that need some further examinations instead of rejecting or accepting them
directly. All loan applicants with an estimated PD b C1 are accepted outright whereas all applicants with
PD NC2 are rejected outright. The remaining loans with (C1 b PD bC2) are classified as marginal loans
which are consequently reviewed by the credit officer. The bank's credit assessment can thus be a
combination of the transactional lending via CSM and relationship lending.
Panel D of Table 6 reports the results of such a CSM. C1 and C2 are determined according to a
somewhat ad hoc decision rule which can be interpreted in the context of PCC. This rule specifies
that (1) the proportion of incorrectly classified bad loans should be less than 5% of all bad loans,
i.e. Bg/B b 5%, and (2) that the proportion of correctly classified bad loans should be at least 75%
of all bad loans, i.e. Bb/B ≥ 75%. The emphasis on bad loans is justified by the relatively high
penalty costs associated with over looking a potentially bad loan. The trade off is the gain from
reducing the number of good loans subjected to evaluation. If evaluation cost were available, C1
and C2 can be determined so that total evaluation cost is minimized. For an even more advanced
approach, evaluation cost can be included in the model presented in Section 4.1. and C1 and C2
can be chosen based on a profit maximization rule. However, since these costs have not yet been

17
See Schreiner (2003), Box 4 for a more in-depth discussion.
18
Note however that Schreiner's (2003) approach is to some extent implicit in our sample as it is restricted to approved
loans and does not cover all loan applications.
494 T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495

precisely estimated by our bank, the above decision rule has been applied resulting in the cut-off
points C1 = 1% and C2 = 21%. The model predicts 19,063 good, 4836 marginal and 1141 bad
loans. Regarding the actual bad loans, 40 were classified as good, 155 as marginal and 603 as bad.
The implication for credit assessment is that the 40 loans will not be evaluated by the credit officer
but incorrectly accepted outright. The 155 loans will be examined together with the 4681 other
marginal loans. If the credit officer can correctly spot these 155 loans, the bank would reject 758
of the 798 actual bad loans. In contrast, the CSM with a single cut-off of 19.95% (50%) leads to a
rejection of only 599 (401) of the observed bad loans. Regarding the observed good loans, our
two cut-off points do not lead to any substantial change compared to the models with one cut-off
point. This is due to the criteria imposed on the cut-off points. Assuming again that the credit
officer can correctly spot the 4681 observed good loans among the marginal loans, the bank
would accept 23,704 of the 24,245 observed good loans. In comparison, the CSM with one cut-
off at 19.95% (50%) achieves a correct prediction of 23,698 (24,136) loans.

5. Conclusions

As banking markets in developing countries mature, banks not only benefit from retail credit
growth but also face increasing competition and regulatory attention to risk management. Can
credit scoring help in these circumstances? Based on our evidence from the Vietnamese retail
loan market, the answer is ‘yes’. First and foremost, a CSM can reduce loan default. Be replacing
its informal credit assessment method with a CSM, our bank can expect a decrease in its default
ratio from 3.3% to 2%. In addition, banks can use a CSM to implement risk-based pricing to
manage its loan portfolio composition. By setting a lower interest rate for less risky borrowers, a
bank will be more competitive in this loan market segment and be able to attract more low-risk
borrowers. We have also shown how banks can calibrate the CSM to manage profits. Yet, in the
case of our Vietnamese bank, these effects were small. Finally, a CSM reduces the time and thus
cost spent by the loan officer on loan assessment. In contrast to corporate or government loans,
retail loans are small and their credit assessment is thus more costly per $ or VND of loan
volume. A CSM can thus be calibrated with this consideration in mind. Overall, banks can
benefit from reduced loan default, increased competitiveness, and reduced cost of loan assess-
ment while optimally using its relationship lending tools, i.e. its loan officer's knowledge about
the borrower. In order to achieve these benefits, it is, however, important to align a CSM to the
local market as the predictive borrower-characteristics and their default probabilities are
country-specific.

References

Allen, L., DeLong, G., & Saunders, A. (2004). Issues in credit risk modeling of retail markets. Journal of Banking and
Finance, 28, 727−752.
Altman, E. I. (1968). Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy. Journal of
Finance, 23, 589−609.
Altman, E. I., & Narayanan, P. (1997). An international survey of business failure classification models. Financial
Markets, Institutions and Instruments, 6, 1−57.
Altman, E. I., & Saunders, A. (1998). Credit risk measurement: Developments over the last 20 years. Journal of Banking
and Finance, 21, 1721−1742.
Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art
classification algorithms for credit scoring. Journal of the Operational Research Society, 54, 627−635.
Blöchlinger, M., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking and Finance,
30, 851−873.
T.H.T. Dinh, S. Kleimeier / International Review of Financial Analysis 16 (2007) 471–495 495

Boyle, M., Crook, J. N., Hamilton, R., & Thomas, L. C. (1992). Method s for credit scoring applied to slow payers. In L. C.
Thomas, J. N. Crook, & D. B. Edelman (Eds.), Credit scoring and credit control (pp. 75−90). Oxford: Oxford
University Press.
Byström, H., Worasinchai, L., & Chongsithipol, S. (2005). Default risk, systematic risk and Thai firms before, during and
after the Asian crisis. Research in International Business and Finance, 19, 95−110.
Committee for Population, Family and Children, ORC Macro, 2003. Vietnam Demographic and Health Survey 2002.
Committee for Population, Family and Children, Hanoi, http://www.measuredhs.com/aboutsurveys/search/
Crook, J. N., Hamilton, R., & Thomas, L. C. (1992). A comparison of discriminations under alternative definitions of
credit default. In L. C. Thomas, J. N. Crook, & D. B. Edelman (Eds.), Credit scoring and credit control (pp. 217−245).
Oxford: Oxford University Press.
Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neutral networks and linear scoring models in the
credit union environment. European Journal of Operational Research, 95, 24−37.
Desai, V. S., Crook, J. N., & Overstreet, G. A. (1997). Credit scoring models in the credit union environment using neutral
networks and genetic algorithms. IMA Journal of Mathematics Applied in Business and Industry, 8, 323−346.
GSO. (1999). Population and Housing Census Vietnam 1999. General Statistics Office of Vietnam (GSO), Hanoi,
http://www.gso.gov.vn
GSO. (2002). Establishment census of Vietnam 2002. General Statistics Office of Vietnam (GSO), Hanoi, http://www.gso.gov.vn
GSO. (2004). Living standard survey 2004. General Statistics Office of Vietnam (GSO), Hanoi, http://www.gso.gov.vn
Henley, W. E. (1995). Statistical aspects of credit scoring. Open University, Dissertation.
Henley, W. E., & Hand, D. J. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the
Royal Statistical Society — Series A (Statistics in Society), 160, 523−541.
Laitinen, E. K. (1999). Predicting a corporate credit analyst's risk estimate by logistic and linear models. International
Review of Financial Analysis, 8, 97−121.
Loi, H. T. (2006). Credit risk and credit access in Asia: Trends and developments in insolvency system and risk
management: The experience of Vietnam. Source OECD Emerging Economies, 2006(8), 281−289.
Mester, L. (1997 September/October). What's the point of credit scoring? Federal Reserve Bank of Philadelphia Business
Review, 3−16.
Philosophov, L. V., & Philosophov, V. L. (2002). Corporate bankruptcy prognosis: An attempt at a combined prediction of
the bankruptcy event and time interval of its occurrence. International Review of Financial Analysis, 11, 375−406.
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing classifiers. Proceedings
of the Fifteenth International Conference on Machine Learning, 445−553.
Sangha, B. S. (1998). A systematic approach for managing credit score overrides. In E. Mays (Ed.), Credit risk modeling
design and application (pp. 223−244). Chicago: Glenlake Publishing.
Schreiner, M., 2003. Scoring: The next breakthrough in microcredit? CGAP Occasional Paper No. 7, http://www.cgap.org
Schreiner, M. (2004). Scoring arrears at a microlender in Bolivia. Journal of Microfinance, 6, 65−88.
Srinivasan, V., & Kim, Y. H. (1987). The Bierman–Hausman credit granting model: A note. Management Science, 33,
1361−1362.
Thomas, L. C. (2000). A survey of credit and behavioral scoring: Forecasting financial risk of lending to consumers.
International Journal of Forecasting, 16, 149−172.
Viganó, L. (1993). A credit scoring model for development banks: An African case study. Savings and Development, 4,
441−482.
Yobas, M. B., Crook, J. N., & Ross, P. (2000). Credit scoring using neural and evolutionary techniques. IMA Journal of
Mathematics Applied in Business and Industry, 11, 111−125.