Logistic Regression Model PDF

A PROJECT
ON
APPLICATION OF LOGISTIC REGRESSION MODEL IN PREDICTING

BANK CREDIT RISK
(A Case Study of Fidelity Bank PLC)
BY
DAVID PETER ILESANMI
(129075018)
AN ASSIGNMENT SUBMITTED
SUBMITTED TO
DR M. ADAMU-IRIA
STATISTICAL PACKAGES
MAT 829
MAY 2013
TABLE OF CONTENTS
Abstract......................................................................................................................................1
TABLE OF CONTENT
1.0 INTRODUCTION ........................................................................................ 1
1.1 Overview and History of the Bank ............................................................... 1
1.2 Scope of the Study ......................................................................................... 2
1.3 Limitation of the Study .................................................................................. 2
1.4 Objectives ...................................................................................................... 2
1.5 Significance of the Study .............................................................................. 3
1.6 Definition of Terms ....................................................................................... 3
CHAPTER TWO
LITERATURE REVIEW ........................................................................................... 5
2.0 Banks Failure ................................................................................................. 5
2.1 Predicting Default of Enterprises................................................................... 5
2.2 Credit Risk ..................................................................................................... 6
2.3 The Origins of Logistic Regression .................................................... .........11
2.4 The Use of Logistic Regression Over Linear Least Square......15
2.5 The Minimum Variance Method15
2.6 Binary and Multinomial Logistic Regression...15
CHAPTER THREE ................................................................................................... ..17

3.0 Research Methodology ............................................................................ ..17
3.1 Method of Data Collection ......................................................................... .17
3.2 Method of Data Presentation ....................................................................... 17
3.3 Data Analysis Method ............................................................................... ..17
3.4 Model Estimation .................................................................................... .17
3.4.1 Coding and Interpretation of the Coefficients of the Model...............19
3.5 Choosing a Significant Model.............................................................19
3.6 Loss Function.....................................................................................22
3.7 ROC Curve.........................................................................................23
CHAPTER FOUR .................................................................................................. .24

DATA PRESENTATION AND ANALYSIS ....................................................... ..24
4.1 Data Presentation ...................................................................................... ..24
4.2 Data Analysis...................................................................................24
4.3 Descriptive Analysis ............................................................................... .24
4.4 Logistic Regression Analysis ...................................................................... 26
4.5 The Odds Ratio....31
4.6 Model Assessment...32
4.7 Classification and Validation..34
4.8 ROC Curve.....36
CHAPTER FIVE .................................................................................................... .38
5.0 SUMMARY, CONCLUSIONS AND RECOMMENDATION ................. 38
5.1 Summary of Findings and Conclusion ........................................................ 38
5.2 Recommendation ......................................................................................... 39
REFERENCES..40
APPENDIX 1
ABSTRACT
It is a common practice for people to live on loans in other to survive. Some will pay back at
the right time while some find it difficult to pay back their loans therefore making the lender
run after them. There is a need therefore to develop a model that will help determine potential
customers that are likely to default on the loans.
This research built a Binary Logistic Regression Model with the aid of Statistical Package for
Social Sciences to help lenders predict potential customers who are likely to default. It
examines the dependent variable: dichotomous outcome (default) by using the independent
variables (Loans, Balance, Collateral, Interest, Number of Days, gender and Education level)
which are either continuous or categorical variables.
CHAPTER ONE
1.0 INTRODUCTION
It is a common practice for people to live on loans in other to survive. And in
order to do this they have to obtain loans either from relatives, banks or thrifts
and credit co-operatives. Business men and woman obtains loans to finance or
start their business why while individuals obtain loans to start business, meet
their daily needs, build houses, pay children school fees, etcit is not surprising
that some of them find it difficult to pay back their loans therefore making
lenders run after them. So every loan obtains from loan officer either from banks
or thrifts and credit co-operatives, it is very important for the loan officer to
examine these customers to know whether they will pay back or not. There is a
need therefore to develop a model that will help determine potential customers
that are probably default on the loans-: customers that will not pay back
(customers that will go bad) or customers that will pay back (customers that will
not go back).
Loan defaulters have successfully broken down some banks or thrifts and credit
co-operatives. So in other for these organizations not to go bankrupted, they
always request for collateral from customers so as to hold the customer on their
toes to pay back the money. This collateral is even part of the factors to put in
consideration in case there is default. Other are-: the authorized limit (amount
given), interest, etc. In order to know which factors will influence this, we make
use of logistic regression model to predict the good or bad credit risk type. Since
some of the dependent variables are binary in nature: categorical in nature. There
is need therefore to use Binary Logistic model which is a general approach to the
analysis of model with binary data that is comparable with the use of linear
models with normally distributed responses. Binary logistic model is use for this
type of responses or outcomes that are dichotomous, or binary, or categorical in
nature.
1.1 Overview and History of the Bank
Fidelity Bank Plc began operations in 1988 as Fidelity Union Merchant Bank
Limited. By 1990, it had distinguished itself as the fastest growing merchant bank
in the country. However, to leverage the emerging opportunities in the
commercial and consumer end of financial services in Nigeria, in 1999, it
converted to commercial banking and changed its name to Fidelity Bank Plc. It
became a universal bank in February 2001, with a license to offer the entire
spectrum of commercial, consumer, corporate and investment banking services.
In 2011, the bank was ranked the 7th most capitalized bank in [Nigeria], the 25th
most capitalized bank on the African continent and the 567th most capitalized
bank in the world. The bank was established as a merchant bank in 1988. It
converted to a commercial bank in 1999, following the issuance of a commercial
banking license by the Central Bank of Nigeria, the national banking regulator.
The current enlarged Fidelity Bank is the result of the merger with the former
FSB International Bank Plc and Manny Bank Plc (under the Fidelity brand name)
in December 2005. Fidelity Bank is today ranked amongst the top 10 in the
Nigerian banking industry, with presence in all the 36 state as well as major cities
and commercial centres of Nigeria. Fidelity continues to rank among Nigerias
most capitalized banks, with tier-one capital of nearly USD 1 billion (one billion
US Dollars)
1.2 Scope of the Study
This research is based on the data collected from Fidelity Bank as a case study
used to predict those customers that are likely default on their loans or not.
1.3 Limitation of the Study
This research major challenge was my inability to get data from some banks.
They were afraid of giving out those details to me because they believed that they
are meant to be kept secret away from third party. Eventually I was able to get
these data from fidelity bank as a result of indefatigability efforts exercised by me
even though it was never easy.
1.4 Objectives:
1. To develop the best model for predicting a dichotomous variable Y(0,1)
called a response(dependent)variable that will serve to explain whether a
customer is a bad or good credit risk type.
2. To develop a model that will help disallow loan defaulting.
1.5 Significance of the Study

To build a Binary Logistic Model that will help loan officers predict customers
that will likely default on the loans given to them or not; knowing whether a
customer is a good or bad credit risk type. Based on this, loan officers can now sit
down and conveniently predict such customer, and refuse to give out loans to
them therefore disallowing loans defaulting.
1.6 Definition of Terms
Authorized Limit (Loans): this is the amount granted to customers.
Binary Logistic Regression analysis examines the influence of various factors on
a dichotomous outcome by estimating the probability of the events occurrence.
Binary logistic regression is typically used when the dependent variable is
dichotomous and the independent variables are either continuous or categorical
variables.
Credit Risk: the risk of loss of principal or loss of a financial reward stemming
from a borrowers failure to repay a loan or otherwise meet a contractual
obligation.
Default: The inability of a borrower to pay the interest or principal on a debt
when it is due.
Dependent Variable: this is the response that shows two possible outcomes
whether the risk being taking is good or bad. The dependent variable as the name
implies depends on the independent variables.
Independent Variables: these are the various factors that influence the outcomes.
Odds: the number of defaulting cases divided by the number of not defaulting
cases.
Odds Ratio: An odds ratio is the ratio of the odds for two different groups
Outstanding Balance: the remaining amount owing by the customers.
Relative risk (RR): ratio between the risk of defaulting and the odd of not
defaulting.
Risk or probability: the number of cases in which there is default, divided by the
total number of cases or risk of occurrence of default.
Security Type: the type of security used to secure the loans.
Security Value: the value of the security type. It is known as collateral.
CHAPTER TWO
LITERATURE REVIEW
2.0 Banks Failure

Failure rates are very often difficult to track properly. However, in the past few
years, considerable research (e.g., Everett and Watson, 1996; and Headd, 2003)
has been conducted to determine the rates and causation of such failures. Two of
the principle reasons businesses suffer unexpected closures are insufficient
capitalization and lack of planning. However, the previous research analyzed only
financial ratios in order to explain the default firms.
However, recent literature (Peel et al., 1986; Grunet et al., 2004; Peel and Peel,
1989; Hill and Wilson, 2007; and Altman et al., 2010) concludes that financial
variables are not sufficient to predict SME default and that including nonfinancial
variables improves the models prediction power. When analyzing business
failure, it is extremely important to distinguish between failure and closure.
Watson and Everett (1996) mention that closing firms could have been
financially successful but closed for other reasons: the sale of the firm or a
personal decision by the owner to accept employment with another firm, to retire,
or the like. To define failure they created five categories:ceasing to exist
(discontinuance for any reason); closing or a change in ownership; filing for
bankruptcy; closing to limit losses; and failing to reach financial goals. Brian
Headd (2003) finds that only one-third of new businesses (33%) closed under
circumstances that owners considered unsuccessful.
2.1 Predicting Default of Enterprises

According to the literature various methods have been used to predict the default
of enterprises. Beaver (1967) originally proposed the use of univariate analysis.
Altman (1968), Altman et al. (1977), and Pompe and Bilderbe (2005) used
Multiple Discriminant Analysis (MDA). For many years thereafter, MDA was the
prevalent statistical technique applied to the default prediction models. It was
used by many authors (Taffler and Tisshaw; 1977; Altman et al., 1977; Micha,
1984). O hlson (1980), for the first time, applied the conditional logistic
regression (logit) to the default predictions study. The research examining
bankruptcy (Ohlson, 1980; Aziz et al., 1988) favors the logit over MDA for both
theoretical and empirical reasons. The logit model requires less restrictive
statistical a ssumptions and offers better empirical discrimination (Zavgren,
1983).
Moreover, the estimated coefficients can be interpreted separately as the
importance or significance of each of the independent variables in the explanation
of the estimated PD. Other researchers also used logit model in order to examine
the default firms (Keasey and Watson, 1987; Ooghe et al., 1995; and Becchetti
and Sierra, 2002).
2.2 Credit Risk

Credit risk is always a big issue in the investment world. Whether you are trying
to determine how much of a credit risk you are or how much of a risk someone or
something else is, you have to know what to look for. Credit is used to help
lenders determine whether someone is worth a risk when it comes to borrowing
money, and in the investing world it plays a big role in what you do or dont have
access to. Trying to figure out whether something or someone has a reasonable
credit risk takes time and a lot of factors, but it can be done. Things that affect the
credit risk of any person or company include factors like:
Credit history: If the person or company in question has a stable credit history,
they are going to be a better credit risk. Those who have a poor history are less
likely to get approved for the funding that they need simply because they dont
prove that they are worth it.
Credit rating: Your current credit rating directly impacts the level of risk that you
present. In the consumer world, a credit score of 720 or higher makes you an
excellent risk, while a score below 599 makes you a very bad credit risk.
Companies have a similar rating scale that helps determine how worthy of a risk
they will be.
Debt to income ratio: If you have more going out than you have coming in, as a
consumer or a business, you have to make sure that you have a low debt to
income ratio to be a better credit risk. That means to keep your debts low as much
as you can, and pay things down whenever you get them up too high.
Security or collateral: In some cases, you may want or need to provide some type
of asset to prove that you are a valuable credit risk. That way, even if your
information doesnt prove you to be a good candidate, the creditor or investor
will know that they arent losing anything by investing in you.
When youre trying to determine credit risk, these are factors to consider. The
world of credit risk is similar among businesses and consumers alike, even
though the specific rules are different from one area to the next. Ultimately, as
long as you understand the basics, youll be able to easily determine whether you,
or anyone or anything else, are worth the investment in the end.
Types of Credit Risk

Credit risk can be classified in the following way:
Credit default risk - The risk of loss arising from a debtor being unlikely to pay
its loan obligations in full or the debtor is more than 90 days past due on any
material credit obligation; default risk may impact all credit-sensitive
transactions, including loans, securities and derivatives.
Concentration risk - The risk associated with any single exposure or group of
exposures with the potential to produce large enough losses to threaten bank's
core operations. It may arise in the form of single name concentration or industry
concentration.
Country risk - The risk of loss arising from a sovereign state freezing foreign
currency payment (transfer/conversion risk) or when it defaults on its obligations
(sovereign risk).
Assessing credit risk
Significant resources and sophisticated programs are used to analyze and manage
risk. Some companies run a credit risk department whose job is to assess the
financial health of their customers, and extend credit (or not) accordingly. They
may use in house programs to advise on avoiding, reducing and transferring risk.
They also use third party provided intelligence.
Most lenders employ their own models (credit scorecards) to rank potential and
existing customers according to risk, and then apply appropriate strategies. With
products such as unsecured personal loans or mortgages, lenders charge a higher
price for higher risk customers and vice versa. With revolving products such as
credit cards and overdrafts, risk is controlled through the setting of credit limits.
Some products also require security, most commonly in the form of property.
Credit scoring models also form part of the framework used by banks or lending
institutions grant credit to clients. For corporate and commercial borrowers, these
models generally have qualitative and quantitative sections outlining various
aspects of the risk including, but not limited to, operating experience,
management expertise, asset quality, and leverage and liquidity ratios,
respectively. Once this information has been fully reviewed by credit officers and
credit committees, the lender provides the funds subject to the terms and
conditions presented within the contract (as outlined above).
Sovereign risk
Sovereign risk is the risk of a government becoming unwilling or unable to meet
its loan obligations, or reneging on loans it guarantees. Many countries have
faced sovereign risk in the late-2000s global recession. The existence of such risk
means that creditors should take a two-stage decision process when deciding to
lend to a firm based in a foreign country. Firstly one should consider the
sovereign risk quality of the country and then consider the firm's credit quality.
Five macroeconomic variables that affect the probability of sovereign debt
rescheduling are:
Debt service ratio
Import ratio
Investment ratio
Variance of export revenue
Domestic money supply growth

Counterparty risk
A counterparty risk, also known as a default risk, is a risk that a counterparty will
not pay what it is obligated to do on a bond, credit derivative, trade credit
insurance or payment protection insurance contract, or other trade or transaction
when it is supposed to.[11] Financial institutions may hedge or take out credit
insurance of some sort with a counterparty, which may find themselves unable to
pay when required to do so, either due to temporary liquidity issues or longer
term systemic reasons. Large insurers are counterparties to many transactions,
and thus this is the kind of risk that prompts financial regulators to act, e.g., the
bailout of insurer AIG. On the methodological side, counterparty risk can be
affected by wrong way risk, namely the risk that different risk factors be
correlated in the most harmful direction. Including correlation between the
portfolio risk factors and the counterparty default into the methodology is not
trivial.
Mitigating credit risk

Lenders mitigate credit risk using several methods:
Risk-based pricing: Lenders generally charge a higher interest rate to borrowers
who are more likely to default, a practice called risk-based pricing. Lenders
consider factors relating to the loan such as loan purpose, credit rating, and loan-
to-value ratio and estimates the effect on yield (credit spread).
Covenants: Lenders may write stipulations on the borrower, called covenants,
into loan agreements:
Periodically report its financial condition
Refrain from paying dividends, repurchasing shares, borrowing further, or

other specific, voluntary actions that negatively affect the company's
financial position
Repay the loan in full, at the lender's request, in certain events such as
changes in the borrower's debt-to-equity ratio or interest coverage ratio
Credit insurance and credit derivatives:
Lenders and bond holders may hedge their credit risk by purchasing credit
insurance or credit derivatives. These contracts transfer the risk from the lender to
the seller (insurer) in exchange for payment. The most common credit derivative
is the credit default swap.
Tightening: Lenders can reduce credit risk by reducing the amount of credit
extended, either in total or to certain borrowers. For example, a distributor selling
its products to a troubled retailer may attempt to lessen credit risk by reducing
payment terms from net 30 to net 15.
Diversification: Lenders to a small number of borrowers (or kinds of borrower)
face a high degree of unsystematic credit risk, called concentration risk. Lenders
reduce this risk by diversifying the borrower pool.
Deposit insurance: Many governments establish deposit insurance to guarantee
bank deposits of insolvent banks. Such protection discourages consumers from
withdrawing money when a bank is becoming insolvent, to avoid a bank run, and
encourages consumers to hold their savings in the banking system instead of in
cash.
2.3 The Origins of the Logistic Regression
Logistic regression model is considered one of the most frequently used statistical
Models for several predictor variables that may be either numerical or categorical
(variables dichotomous in nature) which cannot be analyzed using regression.
There is no evidence that its utility will be declined in the near future given the
steady role regression analysis plays in research. In statistics, Logistic Regression
which is sometimes called Logistic Model or Logit Model is used for predicting
the probability of occurrence of an event by fitting data to a logistic curve.
The logistic function was invented in the 19th century for the description
of the growth of populations and the course of autocatalytic chemical
reactions, or chain reactions. In either case we consider the time path of
a quantity W(t) and its growth rate.
W(t) = dW(t)/dt (1)

The simplest assumption is that W(t) is proportional to W(t)
W(t) = W(t), = W(t)/ W(t), (2)
With the constant rate of growth. This leads of course to exponential growth
W(t) =Aexpt where A is sometimes replaced by the initial value W(0) with
W(t) the human population of a country, this is a model of unopposed growth; as
Malthus (1789) put it, a human population, left to itself will increase in
geometric progression. It is a reasonable model for a young and empty country
like the US in its early years. Like many others, Alphonse Quetelet, the Belgian
astronomer turned statisticians, was well aware that the indiscriminate
extrapolation of exponential growth must lead to impossible values.
Like Quetelet, Verhulst approached the problem by adding an extra term to
equation (2) to represent the increasing resistance to further growth, as in
W(t)= W(t)- W(t) (3)
And then experimenting with various forms of . The logistic appears when this
is a simple quadratic, for in that case we may rewrite equation (3) as
W(t)= W(t)(- W(t)) (4)
Where denotes the upper limit or saturation level of W,its asymptote as

t. Growth is now proportional both to the population already attained
W(t) and to the remaining room for further expansion - W(t). If we
express W(t) as a proportion
P(t)= W(t)/ this gives
P(t) = P(t)(1-P(t)) (5)
And the solution of this differential equations is
P(t) = (6)
which Verhulst named the logistic function. The population W(t) then follows
W(t) = (7)
Verhuslt published his suggestions between 1838 and 1847 and he explains that
he did his research a couple of years before, that he did not have the time for an
update and that he publishes this note only at the insistence of Quetelet and he
named it the logistic. Verhulst also determines the three parameters ,,and of
equation (7) by making the logistic curve pass through three observed points. His
discovery of the logistic curve was not taken up much enthusiasm by Quetelet; as
Vanpaeemel (1987) has shown, the two men did not see eye to eye on the
question of population growth. As a model of population growth the logistic
function was discovered anew in 1920 pearl and reed. They were unaware of
verhulsts work (though not of the curves for autocatalytic reactions discussed
presently), and they arrived independently at the logistic curve of equation (7).
Later, Verhulsts work was rediscovered soon after Pearl and Reeds first paper
of 1920. Verhulst much more handsomely than pearl and reed did, devoting an
appendix to his work. Yule is also the first author to revive the name logistic,
which is not used by Liagre or Du Pasquier (a mathematician who later followed
courses in social sciences) nor by Pearl and Reed in their earlier references. By
1924, however, logistic is used as a commonplace term in the correspondence
between pearl and Yule, who were lifelong friends. As we have already hinted
there is another early root of the logistic function in chemistry, where it was
employed (again with some variations) to describe the course of autocataytic or
chain reactions, where the product itself acts as a catalyst for the process while
the supply of raw materials is fixed. This leads naturally to a differential equation
like (5) and hence to the logistic function for the time path of the amount of the
reaction product. The review of the application of logistic curves to a number of
such process by Reed and Berkson (1929) quotes work of the German professor
of Chemistry Wilhelm Ostwald of 1883. Authors like Yule (1925) and Wilson
(1925) were well aware of thus strand of the literature. The basic idea of logistic
growth is simple and effective, and it is used to this day to model population
growth and market penetration of new products and technologies. The
introduction of mobile telephones is an autocatalytic process, and so is the spread
of many new products and techniques in industry.
The close resemblance of the logistic to the normal distribution function must
have been common knowledge among those who were familiar with the logistic;
it had been demonstrated by Wilson (1925) and written up by Winsor (1932)
(another collaborator of pearl). Wilson was probably the first to publish an
application of the logistic curve in bio-assay in Wilson and Worcester (1943), just
before Berkson (1944). But it was Berkson who persisted and fought a long and
spirited campaign which lasted for several decades.
An accurate history of the adoption and further development of the logit
would require an intimate knowledge of several quite distinct disciplines, for
many new generalizations were introduced independently and in almost complete
isolation in completely unrelated applied work.
In statistic, the analytical advantages of the logit transformation as a means of
dealing with discrete binary outcomes were soon recognized. Cox was among the
first to explore (and exploit) these possibilities; he wrote a series of papers
between around 1960, and followed these up with an influential textbook in 1969.
The logit model of bio-assay is easily generalized to logistic regression where
binary outcomes are related to a number of determinations, without a specific
theoretical background, and this statistical model proved as fertile as linear
regression in an earlier era. Later, the link of the logistic model with discriminate
analysis was recognized, and its ready association with loglinear models in
general. On the specific issue of estimating logit and probit (probability unit)
analyses, maximum likelihood estimation became the norm when routines for this
method, applicable to individual data, were included in commercial statistical
program packages. This facility was probably first offered by BMDP (biomedical
data processing) program of 1977. By the time the first comprehensive textbook
with medical applications of Hosmer and Lemeshow (1989) was published the
use of such routines was taken for granted. Of the two causes Berkson advocated,
minimum chi-squared estimation was effectively overtaken by the computer
revolution, while the logit transformation of logit(p) =log p/1-p was triumphant.
The theoretical justification of bio-assay in terms of determinate stimulus and
random thresholds was first jettisoned in the change to logistic regression, and
then retrieved in the form of the latent regression equation model that is still dear
to the behavioural sciences.
An example of simultaneous independent discoveries is the generalized logistic
regression to the multinomial or polychotomous case. This was the first set out, at
soem length, by the biometric statistician mantel (1966). And some years later
again it was once more rediscovered independently by the econometrician Theil
(1969), who arrived at it from the general perspective of modelling shares.
For a long time, logistic regression, whether in the binary or the multi-nominal
context, was principally used as a technique, a simple tool without a specific
underlying process and therefore without a characteristic interpretation. But in
1973 McFadden, working as a consultant for a California public transportation
project, linked the multinomial logit to the theory of discrete choice from
mathematical psychology. This provided a theoretical foundation of the logit
model that is much more profound than any theory put forward for the use of the
probit in bio-assay.
This same logistic regression we want to use in discovery of the customers that
will default on loans given to them by means of taking into cognizance some
factors that will influence the good or bad loans.
2.4 The use of Logistic Regression over Linear Least

Squares
Logistic regression is very different from linear least squares regression in the
sense that its equations are solved iteratively unlike least squares which can be
solved explicitly with formula. A trail equation is fitted and tweaked over and
over in order to improve the fit. Iterations stop when the improvement from one
step to next is suitably small. Though, from a practical point, both are almost
identical because they both predict but the response variable of logistic regression
is an indicator of some characteristics that is 0 and 1 variable. It is used to
determine whether
2.5 The Minimum Variance Method

In this method, getting the line of best fit is determined by the estimates and
respectively. This will have a minimum variance amongst all unbiased estimators
of and .
2.6 Binary and multinomial Logistics Regression
Binary is a type of logistic regression model which is use for categorical

dependent variable of two outcomes and multinomial is used as outcomes.
Choosing a Procedure for Binary Logistics Regression Models
Binary Logistics regression models can be fitted using either the Logistic
Regression procedure or the multinomial Logistic Regression procedure.
Each Procedure has options not available in the other. An important theoretical
distinction is that the logistic Regression procedure produces all predictions,
residuals, influence statistic, and goodness-of-fit test using data at the individual
case level, regardless of how the data are entered and whether or not the number
of covariate patterns is smaller than the total number of cases, while the
multinomial logistic Regression procedure internally aggregate cases to form
subpopulations with Identical covariate patterns for the predictors, producing
predictions, residuals, and goodness-of-fit test based on these subpopulations. If
all predictors are categorical or any continuous predictors take only a limited
number of values-so that there are several cases at each distinct covariate
patterns-the subpopulation approach can produce valid goodness-of-fit tests and
for informative residuals, while the individuals case level are approach cannot.
Binary Logistic Regression provides the following unique features:
Hosmer-Lemeshow test of goodness of fit for the model

Stepwise analyses.
Contrasts to define model parameterization
Alternative cut points for classification
Model fitted on one set of cases to a held-out set of cases
Saves predictions, residuals, and influence statistics
Multinomial Logistic Regression Provides The Following Unique Features:
Pearson and deviance chi-square tests for goodness of fit of the model
Specification of subpopulations for grouping of data for goodness-of-fit
tests
Listing of counts, predicted counts, and residuals by subpopulations
Correction of variance for over-dispersion
Covariance matrix of the parameter estimates
Tests of linear combinations of parameters
Explicit specification of nested models
Fit 1-1 matched conditional logistic regression models using differenced
variables.
CHAPTER THREE
3.0 Research Methodology
Binary Logistic Regression model is the statistical tool used to carry out this
research in order to study the tendency of customers who will default on the loans
given to them. Records of 300 customers were used in the process.
3.1 Method of Data Collection
The data collected for this research are the data of customers who collected
Fidelity Bank Nigeria PLC. Their balances as at 5th 0f May, 2013, are given in
Appendix I. This is a secondary type of data because it will be almost impossible
to collect these data from individual directly.
3.2 Method of Data Presentation
The data collected will be shown in tabular form with each variable forming the
columns and each customer makes up the rows. The result after analysing the
data is shown through tables and plots in chapter four.
3.3 Data Analysis Method
Statistical package for social sciences (SPSS) is employed to analyse the data
using Binary Logistic Regression Model in order to estimate the parameters
involved due to two outcomes of the dependent variable.
3.4 Model Estimation
A Linear Logistic Model (LLM) assumes that for each possible set of values for
the independent (X) variables, there is a probability p that an event (success)
occurs. Then, the model is that Y is a linear combination of the values of the x
vector.
Y = 0 + ixi+e i=1,2,3,....n
Y= 0 +1x1 +2x2 + 3 x3 + 4 x4 +..+ n xn +e
If x1, x2,.,x1 are a collection of independent variables and Y is a vector
variable with the probability of success(p), then
E(Y/X) = 0 +1 E(x1) +2E( x2) + 3 E( x3)+..+ n E( xn)
If y = 1 with probability P and
Y = 0 with probability (1-p)
Odds = p/(1-P)
Logistic Regression model is given as Logit(y) = In(p/1-p) which starts by
considering the existence of an unobserved continuous variables, Y, which can be
thought of as the customers propensity to default on a loan, with larger values of
Y corresponding to greater probabilities of defaulting.
Y = 0 + ixi+e i=1,2,3,....n
Y= 0 +1x1 +2x2 + 3 x3 + 4 x4 +..+ n xn +e
Where 0=the constant of the equation and, i = the coefficient of the predictor
variables.
And e are the residuals, i.e the variability not explained in the model.
The model assumes that Y is linearly related to the predictors.
In the logistic regression model, the relationship between Y and the probability
of the event of interest is described by this link function.
Y = In(p/1-p)
Where
P is probability that each case experiences the event of interest
Y is the value of the unobserved continuous variable for each case.
Since Y is unobserved, we relate the predictors to the probability of interest by
substituting for
Y= 0 +1x1 +2x2 + 3 x3 + 4 x4 +..+ n xn +e
In the model and the regression coefficients are estimated through an iterative
maximum likelihood method.
3.4.1 Coding and Interpretation of the Coefficients of the Model
Dependent variable: We code as 0 the occurrence of default and 1 the absence of
default
Independent variable: these are of different types:
Numerical variable: in other to introduce the variable in the model, it must

satisfy the linearity hypothesis, i.e., for each unit increase in the numerical
variable, the OR (i) increases by a constant multiplicative value
Dichotomic variable: Male coded as 1 and Female coded as 2.
Categorical variable: education level has the following codes
o Primary =1
o Secondary=2
o OND=3
o HND/BSc=4
o Other professional qualifications=5
When coefficient of the variable is positive, we obtain OR >1, and it
therefore=re corresponds to a risk factor. If the value of is negative,OR will be
< 1, and the variable therefore corresponds to a protective factor
3.5 Choosing a Significant Model

An estimating algorithm is used to find the coefficients; s that best satisfy
the relationship expressed in the regression equation for the estimation data
sample. The technique used to find those coefficients for logistic regression, was
using maximum likelihood estimation. Basically, method tries coefficients until it
finds the set that maximizes the value of a mathematical function that gives the
joint probability of observing the given data.
That function, L, the likelihood function, forms the basis of a statistical test
of how well the model fits the observed data.
L2 =2(logLT -logLB )
where LT is the likelihood function of the first model with smaller variables and
the LB is the likelihood function of a baseline model.
L2 is a statistic that will be compared with the standard table to determine
whether the tested model fits significantly better than a baseline model. This
procedure is completed to establish whether the added variables significantly
improve to the data, or whether conversely the smaller subset is equally
sufficient. We use this procedure to omit from the model variables that do not
significantly improve our ability to predict insolvency.
Classification plots are used to show graph of the data analysis. Hosmer-
Lemeshow goodness-of-fit statistic indicates a poor fit if the significance value is
less than 0.05.
Pseudo R-Squared Statistics are used since the r-squared statistic, which
measures the variability in the dependent variable explained by a linear regression
model, cannot be computed for logistic regression models. It is designed to have
similar properties to the true r-squared statistic.
Also, the square of the ratio of the coefficient to its standard error equals the
Wald statistic.
Wald Test
The Wald test is used to test the statistical significance of each coefficient (b) in
the model. A Wald test calculates a Z statistics which is:

( )
2
This value is squared which yields a chi-square distribution and is used as the
Wald test statistics (Alternatively the value can be directly compared to a normal
distribution).
The goodness of fit of a statistic: the model describes how well it fits a set of
observations. Measures of goodness of fit typically summarize the discrepancy
between observed values and the values expected under the model in question.
Such measures can be used in statistical hypothesis testing, e.g. to test for
normality of residuals, to test whether two samples are drawn from identical
distributions or whether outcome frequencies follow a specified distribution.
One way in which a measure of goodness of fit statistic can be constructed, in
the case where the variance of the measurement error is known, is to construct a
weighted sum of squared error:
Where is the known variance of the observation. This definition is only useful
when one has estimates for the error on the measurements, but it leads to a
situation where a chi-square distribution can be used to test goodness of fit,
provided that the errors can be assumed to have a normal distribution.
The reduced chi-squared statistic is simply the chi-squared divided the number
of degrees of freedom:
Where V is the number of degrees of freedom, usually given by N-n-1, where N

is the number of observations, and n is the number of fitted parameters, assuming
that the mean value is an additional fitted parameter. The advantage of the
reduced chi-squared is that it already normalizes for the number of data points
and model complexity.
As a rule of thumb a large indicates a poor model fit.
3.6 LOSS FUNCTION
A loss function is a measure of fit between a mathematical model of data and
the actual data. We choose the parameters of our model to minimize the badness-
of-fit or to maximize the goodness-of-fit of the model to the data. With least
squares (the only loss function we have used thus far), we maximize SSreg, the
sum of squares residual. This also happens to maximize SSreg, the sum of square
due to regression. With linear or curvilinear models, there is a mathematical
solution to the problem that will minimize the sum of squares, that is,
B= y
Or
= r
With some models, like this logistic curve, there is no mathematical solution
that will produce least squares estimates of the parameters. For these models, the
loss function chosen is called maximum likelihood. Likelihood is a conditional
probability (e.g., P(Y/X), the probability of Y given X). We can pick the
parameters of the model (a and b of the logistic curve) at random or by trial-and-
error and then compute the likelihood of the data given those parameters. We will
choose as our parameters, those that result in the greatest likelihood computed.
The estimates are called maximum likelihood because the parameters are chosen
to maximize the likelihood (conditional probability of the data given parameter
estimates) of the sample data. The techniques fall under the general label
numerical analysis. There are several methods of numerical analysis, but they all
follow a similar series of steps. First, the computer picks some initial estimates of
the parameters. Then it will compute the likelihood of the data given these
parameter estimates. Then it will improve the parameters estimates slightly and
recalculate the likelihood of the data. It will do this forever until it is told to stop,
which we usually do when the parameters estimates do not change much.
Sometimes we tell the computer to stop after a certain number of tries or
iterations, e.g., 20 or 250. This usually indicates a problem in estimation.
3.7 Receiver Operating Characteristic (ROC) Curve
A measure of goodness-of-fit often used to evaluate the fit of a logistic regression
model is based on the simultaneous measure of sensitivity (True positive) and
specificity (True negative) for all possible cut-off points. First, we calculate
sensitivity and specificity pairs for each possible cut-off point and plot sensitivity
on the y axis by (1-specificity) on the x axis. This curve is called the receiver
operating characteristic (ROC) curve. The area under the ROC curve ranges from
0.5 and 1.0 with larger values indicative of better fit.
Test variables are often composed of probabilities from logistic regression. The
state variable can be the true category to which a subject belongs. The value of
the state variable indicates which category should be considered positive.
CHAPTER FOUR
4.0 DATA PRESENTATION AND ANALYSIS
4.1 Data Presentation
Data used for this analysis comprised of 300 customers of Fidelity Bank Nigeria
PLC, dated 5th of May, 2013. The data is shown in Appendix II.
4.2 Data Analysis

The analysis was carried out on SPSS using Binary Logistic Regression.
4.3 Descriptive Analysis
SPSS OUTPUT 1
previously defaulted * validate Crosstabulation
validate Total
0 1
Count 18 27 45
% within
100.0
previously 40.0% 60.0%
Yes %
defaulted
16.2
% within validate 13.8% 18.4%
previously %
defaulted Count 112 120 232
% within
100.0
No %
defaulted
83.8
%
Count 130 147 277
% within
100.0
Total %
defaulted
100.0
%
The cross tabulations also show that the modeling sample contains 120 customers
who did not default on a previous loan, and 27 who did default. The validation or
holdout sample contains 112 customers who did not default, and 18 who did.
SPSS OUTPUT 2
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Loan 300 46000 75000000 1845374.49 4640069.418

Balance 300 .00 4899992.00 512685.6708 1168880.57401
Collateral 300 46000 100000000 2979944.81 6900862.169
Interest 300 0 234 17.82 15.735
Days 300 1 3376 478.39 386.167
Gender 300 1 2 1.46 .499
Edlev 300 1 5 3.02 1.379
Default 277 0 1 .84 .370
Valid N (listwise) 277
Previously Defaulted
Frequency Percent Valid Percent Cumulative
Percent
Yes 45 15.0 16.2 16.2
Valid No 232 77.3 83.8 100.0
Total 277 92.3 100.0

Missing System 23 7.7
Total 300 100.0
Gender
Percent
Male 163 54.3 54.3 54.3
Valid Female 137 45.7 45.7 100.0
Total 300 100.0 100.0

EDUCATION LEVEL
Percent
Primary 54 18.0 18.0 18.0
Secondary 61 20.3 20.3 38.3
OND 67 22.3 22.3 60.7

Valid HND/Bsc 60 20.0 20.0 80.7
Other Postgraduate
58 19.3 19.3 100.0
qualifications
Total 300 100.0 100.0
4.4 Logistic Regression Analysis
Block 0: Beginning Block
SPSS OUTPUT 3
Case Processing Summary

a
Unweighted Cases N Percent
Included in Analysis 147 49.0
Selected Cases Missing Cases 0 .0
Total 147 49.0

Unselected Cases 153 51.0
Total 300 100.0
a. If weight is in effect, see classification table for the total number of

cases.
We will use a random sample of 147 of these 277 customers to create a risk model. We will
set aside the remaining 130 customers as a holdout or validation sample on which to test the
credit-risk model; then use the model to classify the 23prospective customers as good or bad
credit risks.
SPSS OUTPUT 4
Dependent Variable Encoding
Original Value Internal Value
Yes 0
No 1
1 represents not defaulting (non defaulter) while 0 represents defaulting

(defaulter).
SPSS OUTPUT 5
Classification Tablea
Observed Predicted
Selected Casesb Unselected Casesc,d
previously defaulted Percentag previously defaulted Percentag
Yes No e Correct Yes No e Correct
Yes 0 27 0 0 12 0
Step previously defaulted
No 0 120 100.0 0 118 100
1
Overall Percentage 81.6 86.2
a. The cut value is .500
b. Selected cases validate EQ 1
c. Unselected cases validate NE 1
d. Some of the unselected cases are not classified due to either missing values in the independent variables or
categorical variables with values out of the range of the selected cases.
The classification table shows that the model makes a correct prediction of 81.6% of the time
overall selected cases and 86.2% of the time overall selected cases.
SPSS OUTPUT 6
Categorical Variables Codings
Categorical Variables Codings
Frequency Parameter coding
(1) (2) (3) (4)
Primary 26 1.000 .000 .000 .000
Secondary 34 .000 1.000 .000 .000
OND 27 .000 .000 1.000 .000

Edlev
HND/Bsc 36 .000 .000 .000 1.000
Other Postgraduate
24 .000 .000 .000 .000
qualifications
The table above shows that there are Primary (26), Secondary (34), OND (27), HND/BSC
(36) and Other Postgraduate qualifications (24) customers who obtained loans.
SPSS OUTPUT 7
a
Variables not in the Equation
Score Df Sig.
Edlev 1.730 1 .188
Loan .253 1 .615
Balance 3.874 1 .049

Step 0 Variables
Collateral .438 1 .508
Days 5.338 1 .021
Gender 1.269 1 .260
a. Residual Chi-Squares are not computed because of redundancies.
SPSS OUTPUT 7 labeled Variables not in the Equation lists each of the predictors in
turn. Variables not in the Equation tells us that some of the independent variables improve
the model while some do not. Balance and Days are significant while the others are not
significant.
SPSS OUTPUT 8
Variables in the Equation
B S.E. Wald Df Sig. Exp(B)
Step 0 Constant 1.492 .213 49.042 1 .000 4.444
Output 8 summaries the model (variables in the equation), at this

stage the value of the constant 0 is 1737.
2
( )
= 49.042
P
lnODDS ln 1.492
1 P
Block 1: Method = Enter

SPSS OUTPUT 9
Observed
Predicted
Selected Casesb Unselected Casesc,d
previously defaulted Percentage Correct previously defaulted Percentage
Yes No Yes No Correct
Yes 5 22 18.5 6 12 33.3

previously defaulted
Step 1 No 1 119 99.2 5 107 95.5

b. Selected cases Validate EQ 1
c. Unselected cases Validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with values out of the range
of the selected cases.
The table above shows that the model:

1. For the selected cases: 84.4% was accurately predicted; 5 out of 27 was correctly
predicted for customers who had previously defaulted and 119 out of 120 was correctly
predicted for customers who were non-defaulters.
2. For the unselected cases: 86.9% was accurately predicted; 6 out of 18 was correctly
predicted for customers who had previously defaulted and 107 out of 112 was correctly
predicted for customers who were non-defaulters.
SPSS OUTPUT 10
Hosmer and Lemeshow Test
Step Chi-square Df Sig.
1 9.059 8 .337
Decision: The lack of significance of the Chi-Squared test indicates that the
model is a good fit since 0.337 is greater than 0.05(the level of significance).
SPSS OUTPUT 11
B S.E. Wald df Sig. Exp(B)
Edlev -.189 .187 1.017 1 .313 .828
Loan .000 .000 .051 1 .821 1.000
Balance .000 .000 6.861 1 .009 1.000
Collateral .000 .000 2.430 1 .119 1.000

a
Step 1
Interest -.107 .042 6.625 1 .010 .899
Days .002 .001 4.468 1 .035 1.002
Gender -.554 .475 1.364 1 .243 .574
Constant 4.179 1.343 9.688 1 .002 65.328
a. Variable(s) entered on step 1: Edlev, Loan, Balance, Collateral, Interest, Days, Gender.
At 0.05 level of significance, Balance, Interest and Days are highly significant
but Education Level, Loan, Collateral and Gender are not significant by Wald
Statistics. The test of the intercept (i.e constant) suggested that merely suggests
whether an intercept should be included in the model. For the present data set, the
test result (p<0.05) suggests that model with intercept should be applied to the
data.
The coefficients estimates are used to estimate the probability of not defaulting is
as follows:
P(Y =1/X) =
Y = 0 +1x1 +2x2 + 3 x3 + 4 x4 +..+ n xn

Hence:
ln = Y= 4.179 0.189 x1 +0.00x2 +0.00x3 + 0.00 x4 0.107 x5 + 0.02 x6

0.554x7
Logit(Y)=4.1790.189EDUCATIONLEVEL+0.00LOAN +0.00COLLATERAL
+ 0.00BALANCE 0.107INTEREST + 0.02DAYS 0.554GENDER
As with any regression, the positive coefficients indicate a positive relationship
with the dependent variables.
4.5 The odds Ratio Results
SPSS OUTPUT 13
The following odds ratios were calculated using the formula;
For every covariate used in the study,
Odds 95% C.I.for

Ratio EXP(B)
Lower Upper
Edlev .828 .573 1.195
Loan 1.000 1.000 1.000
Balance 1.000 1.000 1.000
Collateral 1.000 1.000 1.000
Step 1a
Interest .899 .828 .975
Days 1.002 1.000 1.003
Gender .574 .227 1.456
Constant 65.328
Men are 1.456 times more likely to default than for women to default
4.6 Model Assessment

SPSS OUTPUT 13
Block 1: Method = Forward Stepwise (Likelihood Ratio)
Model Summary
Step -2 Log likelihood Cox & Snell R Nagelkerke R

Square Square
a
1 134.478 .038 .062
a
2 128.330 .078 .126
b
3 123.998 .104 .170
a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than .001.
b. Estimation terminated at iteration number 6 because
parameter estimates changed by less than .001.
The Nagelkerke statistic in the far right hand column represents a good approximation to that
statistic, having a maximum possible value of 1.00. It shows that approximately 17% of the
variation in the dependent variable is explained by the three predictors in our final model.
SPSS OUTPUT 15
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step Days .002 .001 5.118 1 .024 1.002 1.000 1.003

a
1 Constant .822 .337 5.947 1 .015 2.274
Interest -.068 .029 5.346 1 .021 .934 .882 .990
Step
b
Days .002 .001 7.869 1 .005 1.002 1.001 1.004
2
Constant 1.815 .579 9.833 1 .002 6.143
Balance .000 .000 4.216 1 .040 1.000 1.000 1.000
Step Interest -.106 .040 7.147 1 .008 .900 .832 .972

c
3 Days .002 .001 4.888 1 .027 1.002 1.000 1.003
Constant 2.966 .901 10.827 1 .001 19.415
a. Variable(s) entered on step 1: Days.

b. Variable(s) entered on step 2: Interest.
c. Variable(s) entered on step 3: Balance.
Output 15 shows that our stepwise model-building process included three steps. In
the first step, a constant as well as the Days predictor variable are entered into the
model. At the second step Interest is added to the model. And the final step adds
Balance.
This confirms the three predictor variables that were previously stated as being
significant.
The B column shows the coefficients (called Beta Coefficients, abbreviated with
a B) associated with each predictor. We see that Interest has negative
coefficients, indicating that customers who have less time are somewhat more
likely to default on a loan. Balance and Days whose coefficients are positive show
that customers who have more Balance and Days are associated with a greater
likelihood of defaulting on a loan.
SPSS OUTPUT 15
Omnibus Tests of Model Coefficients
Chi-square Df Sig.
Step 5.736 1 .017
Step 1 Block 5.736 1 .017
Model 5.736 1 .017

Step 6.148 1 .013
Step 2 Block 11.884 2 .003
Model 11.884 2 .003
Step 4.332 1 .037
Step 3 Block 16.216 3 .001
Model 16.216 3 .001
Overall Chi-square test

H1: i = 0 for all i
H2: i 0 for at least 1 coefficient
H1 is rejected since p-value < 0.05 in all the three steps
Hence the model is significant.
4.7 Classification and Validation
SPSS OUTPUT 16
a
Classification Table
Observed Predicted
b c,d
Selected Cases Unselected Cases
previously defaulted Percentage previously defaulted Percentage
Yes No Correct Yes No Correct
Yes 0 27 .0 0 18 .0
No 0 120 100.0 0 112 100.0
1
Yes 0 27 .0 0 18 .0
No 1 119 99.2 3 109 97.3
2
Yes 1 26 3.7 1 17 5.6
No 1 119 99.2 3 109 97.3
3

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical
variables with values out of the range of the selected cases.
Output 16 clearly shows that the model correctly classified about 99.2% of the modeling
samples non-defaulters and about 4% of the modeling samples defaulters, for an overall
correct classification percentage of about 82%. Similarly, when applied to the holdout or
validation sample, the model correctly identified about 97% of the non-defaulters and about
6% of the defaulters, for an overall correct classification percentage of about 85%.
SPSS OUTPUT 17
SPSS OUTPUT 17 is our modeling graph the right hand side the modeling process assigned
the bulk of the actual non-defaulters very low probabilities of defaulting And the left hand
graph shows that the model assigned the bulk of the defaulters very high probabilities of
defaulting. So this adds more confirmation that we have a good model.
Since we have a valid predictive model, we can use it to score a prospect file. The graph
below shows the result after we have scored our 23 prospects.
It shows that all the prospects would not be expected to default on a loan.
4.8 Receiver Operating Characteristic (ROC) Curve
drCase Processing Summary

previously Valid N (listwise)
defaulted
Positivea 232
Negative 45
Missing 23
Larger values of the test result variable(s) indicate stronger evidence for a positive
actual state.
a. The positive actual state is No.
The further the curve lies above the reference line, the more accurate the test.
Here, the curve lies further well enough from the reference line.
Area Under the Curve
Test Result Variable(s): Predicted probability
a b
Area Std. Error Asymptotic Sig. Asymptotic 95% Confidence
Interval
Lower Bound Upper Bound
.765 .040 .000 .687 .844
The test result variable(s): Predicted probability has at least one tie between
the positive actual state group and the negative actual state group. Statistics
may be biased.
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
The area under the curve is .765 with 95% confidence interval (.687, 844). Also,
the area under the curve is significantly different from 0.5 since p-value is .000
meaning that the logistic regression classifies the group significantly better than
by chance.
CHAPTER FIVE
5.0 Summary of Findings, Conclusion and Recommendation
5.1 Summary of Findings and Conclusion
In this study, some customers accounts of Fidelity Bank PLC as at 5 th of May,
2013 were examined using Binary Logistic Regression and a model built for
lenders. We have built a model which lenders at the bank will use to predict the
probability that a potential loanee will default or not. It examines the dependent
variable: dichotomous outcome (default) by using the independent variables
(Loans, Balance, Collateral, Interest, Number of Days, gender and Education
level) which are either continuous or categorical variables; we have demonstrated the
use of risk modeling using logistic regression analysis to identify demographic and
behavioral characteristics associated with likelihood to default on a bank loan. Significance
testing using Wald test and likelihood ratio showed that at 5% level of
significance, Balance, Interest and Days are highly significant; but Education
Level, Loan, Collateral and Gender are not significant by Wald Statistics.
Also, the area under the ROC curve is significantly different from 0.5 since p-
value is .000 meaning that the logistic regression classifies the group significantly
better than by chance.
Thus, this model can be used to predict the probability that a given customer who
obtain loan will default or not.
In conclusion, the model has shown that lenders should always put Balance, Days
and Interest into consideration before given out loans.
5.2 Recommendations
The researcher recommends the following:
1. The model of this research is highly recommended for the bank.
2. Too much money should not be granted to customers for only few days
because such customers might find it difficult to pay back before deadline.
3. Customers who obtained loans should be reminded their due date whenever
the deadline is near in order to prompt payment.
4. The character of a customer should be really considered if he/she is the type
that is addicted to defaulting on loans.
5. Lenders should put the Interest rate, Balance, and Days into consideration
before given out loans.
REFERENCES
1. Ainsworth, Logistic Regression
2. Altman, E.I.; Edward, I.; Haldeman, R.; Narayanan, P. A New Model to
Identify Bankruptcy risk of corporation. Journal of Banking and Finance,
1977, 1, 2954.
3. Amr I. Abdelrahman, Applying Logistic Regression Model to The Second
Primary Cancer Data;Department of Statistics, Mathematics, and
Insurance. Faculty of Commerce, Ain Shams University, Egypt.
4. Aziz, A.; Emanuel, D.; Lawson, G. Bankruptcy Prediction An investigation
of cash flow based models. Journal of Management Studies, 1998, 25, 419
437.
5. Bogess, W.B., 1967. Screen-test your Credit Risk. Journal of Harvard
Business Review. Volume 45, pp 21-113.
6. Cramer J.S. (2003): The Origin and development of logit Model, Cambridge
University Press: Cambridge.
7. Hand, D.J (2010): Modeling Consumer Risk, IMA Journal of Management
Mathematics, 12,137-255
8. Karl L. Wuensch, Dept of Psycholog, East Carolina University, Binary
Logistic Regression with SPSS/PASW
9. Menard, S.(1995). Applied Logistic Regression Analysis, Sage Publication,
New Bury Park: Carlifornia
10. Mogboyin, O., T.O. Asaolu and O.T. Ajilore, 2012. Bank Consolidation
Program and Lending Performance in Nigerian Banking System: An
Empirical Analysis with Panel Data. The International Journal of Applied
Economics and Finance, 6: 100-108.
11. Pompe, P.P.M.; Bilderbe, J. The Prediction of Bankruptcy of Small- and
Medium-sized Industrial Firms. Journal of Business Venturing, 2005, 20,
847868.
12. www.smalldrill.com/logistic-regression.html
13. www.wikipeadia.com
APPENDIX 1
Loans Balance Collateral Interest Days gender Ed Lev Default Validate PRE_1 PGR_1 COO_1
192000 0 320000 28 808 1 1 1 1 0.87567 1 0.00583
384000 0 64000 23 682 2 5 1 0 0.70441 1 0.02797
5000000 4625600 15000000 6 39 1 3 0 1 0.81977 1 0.91138
350000 0 350000 28 1625 2 4 1 1 0.90688 1 0.01069
2250000 2063929 6600000 6 77 1 2 1 1 0.9074 1 0.00392
384000 2000.15 640000 28 694 2 3 0 0 0.70889 1 0.11539
1000000 0 2100000 27 938 1 1 1 0 0.93834 1 0.00189
5000000 3751320 13600000 6 77 1 5 1 1 0.84765 1 0.02331
160000 0 320000 28 871 2 3 1 0 0.75648 1 0.01825
272000 0 320000 23 983 1 4 1 1 0.90235 1 0.00355
380000 0 380000 28 1225 2 4 0 0 0.82823 1 0.43932
5039650 0 5056723 6 435 1 4 1 0 0.98268 1 0.00045
787500 734915.2 15000000 6 10 2 1 1 1 0.99622 1 0.0001
405000 3198582 6000000 6 133 1 1 1 1 0.79734 1 0.04591
272000 0 320000 23 730 2 3 1 0 0.80374 1 0.0075
4952187 0 4952187 6 532 2 5 1 1 0.96906 1 0.00142
4768904 0 4768904 6 10 1 5 1 0 0.95415 1 0.00295
787500 190671.9 15000000 6 10 1 3 1 0 0.99818 1 0.00002
678888 740847.4 12000000 6 10 1 2 1 1 0.99403 1 0.00015
4050000 3654383 10000000 6 9 2 4 0 1 0.60024 1 0.22482
3000000 2763470 4500000 6 10 1 2 1 1 0.6934 1 0.05286
5039650 18091.7 5039660 6 8 2 1 1 1 0.96342 1 0.0026
192000 0 320000 23 983 1 3 1 0 0.91821 1 0.00216
192000 0 320000 23 703 2 4 1 0 0.76467 1 0.01261
787500 746667.4 1500000 6 213 1 3 0 0 0.91574 1 0.35998
300000 0 600000 17 897 1 5 1 1 0.93073 1 0.00253
4500000 0 5500000 25 930 2 2 1 1 0.94632 1 0.0039
200000 0 600000 36 149 2 4 1 0 0.24676 0 0.39715
1750000 127059 5000000 25 72 1 5 1 1 0.77888 1 0.03178
2560000 0 3000000 28 118 1 2 1 0 0.75243 1 0.03596
3000000 0 15000000 30 633 1 3 0.99254 1
192000 0 500000 0 1105 2 3 1 0 0.98991 1 0.00015
192000 0 320000 23 875 1 1 1 1 0.93118 1 0.00178
100000 0 100000 0 534 2 2 1 0 0.97491 1 0.00067
50000 0 50000 0 546 1 5 1 1 0.97486 1 0.0008
125000 0 125000 0 722 2 3 1 0 0.97831 1 0.00053
1440000 1554769 1440000 6 343 1 1 1 1 0.89122 1 0.00685
5000000 935319.3 10000000 6 133 1 5 1 0 0.97624 1 0.00074
700000 0 700000 4 507 1 3 1 1 0.97523 1 0.00045
5000000 4555817 10000000 6 337 1 4 0 0 0.63438 1 0.4135
300000 0 600000 17 897 2 2 0 1 0.93156 1 0.34938
5000000 4630478 10000000 6 10 1 1 1 0 0.61348 1 0.18744
5000000 4897826 10000000 6 99 2 3 1 1 0.3573 0 0.53502
384000 0 384000 23 771 1 4 1 1 0.86501 1 0.00437
300000 5923.08 300000 23 686 2 3 1 0 0.7889 1 0.00789
800000 0 800000 21 701 1 2 0 1 0.91764 1 0.18564
5290221 0 5500000 21 765 1 1 1 0 0.9757 1 0.00107
300000 0 300000 23 633 2 4 1 1 0.73911 1 0.01345
600000 0 300000 23 393 2 5 1 0 0.6001 1 0.04144
46000 0 46000 23 1065 1 5 1 0 0.89286 1 0.00674
500000 0 500000 23 350 1 2 1 1 0.81953 1 0.00582
180000 0 180000 23 864 2 1 0.88011 1
500000 0 500000 21 832 1 1 1 1 0.94105 1 0.0013
2520000 2358182 2520000 6 9 1 3 1 1 0.62953 1 0.09056
1440000 1372656 1440000 6 15 1 4 1 0 0.75794 1 0.02856
1050000 801584.6 1050000 6 34 2 2 1 1 0.81831 1 0.01554
2880000 1968066 2880000 6 15 1 5 1 0 0.65406 1 0.08158
1440000 1207576 1440000 6 34 2 3 0 1 0.72696 1 0.20884
1440000 1307209 1440000 6 15 1 2 1 0 0.83016 1 0.01357
1440000 1318354 1440000 6 15 2 5 1 1 0.61152 1 0.08602
980000 707375.7 980000 6 34 2 3 0.80189 1
1440000 1099316 1440000 6 34 2 1 1 1 0.81283 1 0.02011
1440000 1099316 1440000 6 34 1 1 0 0 0.88317 1 0.45319
1440000 1272962 1440000 6 15 1 5 1 0 0.74166 1 0.04063
1440000 1347237 1440000 6 15 1 4 1 1 0.76269 1 0.02777
1440000 1470634 1440000 6 337 1 3 1 0 0.85824 1 0.00879
1440000 1339071 1440000 6 15 2 4 0.65057 1
1440000 1307209 1440000 6 15 2 5 0.61424 1
2880000 2621569 2880000 6 160 1 2 0.68796 1
2880000 2504823 2880000 6 164 2 4 0.49625 0
2880000 1521562 2880000 6 652 1 3 0.93103 1
2880000 2678142 2880000 6 15 2 2 0.4803 0
2880000 2860578 2880000 6 71 1 2 0.59569 1
787500 746667.4 1500000 6 213 2 4 1 1 0.83786 1 0.0119
300000 0 600000 17 897 1 1 1 0 0.96625 1 0.00055
4500000 0 5500000 25 930 2 1 1 0 0.95515 1 0.00316
200000 0 600000 36 149 1 1 1 1 0.50141 1 0.21966
1750000 127059 5000000 25 72 2 4 0 1 0.70969 1 0.32739
2560000 0 3000000 28 118 1 3 1 1 0.71556 1 0.03978
3000000 0 15000000 30 633 2 4 1 0 0.98443 1 0.0012
192000 0 320000 23 875 2 2 1 0 0.86547 1 0.00555
192000 0 320000 23 615 1 3 1 1 0.85394 1 0.00328
192000 0 320000 23 688 1 5 1 0 0.82011 1 0.01103
192000 0 192000 23 771 1 1 1 1 0.91571 1 0.00232
1533333 0 1000000 0 617 2 1 1 0 0.98438 1 0.00031
400000 0 400000 0 562 2 2 1 1 0.97749 1 0.00054
125000 0 125000 0 722 1 2 1 0 0.98957 1 0.00013
100000 0 100000 0 533 1 2 1 1 0.9854 1 0.00023
100000 0 100000 0 533 1 3 1 0 0.98242 1 0.00033
50000 0 50000 0 546 2 2 1 1 0.97518 1 0.00066
50000 0 50000 0 534 1 4 1 0 0.97866 1 0.00051
50000 0 50000 0 534 2 4 1 1 0.96343 1 0.00146
100000 0 100000 0 534 1 1 1 1 0.98791 1 0.00017
192000 0 500000 0 1105 2 2 1 1 0.99164 1 0.00011
5329969 0 6000000 6 343 1 4 1 1 0.98393 1 0.00041
5000000 4723548 10000000 6 142 2 4 0.37259 0
5000000 4676762 10000000 6 1 1 2 0.55217 1
5000000 4899992 10000000 6 34 1 1 0.55674 1
5000000 4318165 10000000 6 161 2 5 0.43524 0
5000000 4348778 10000000 6 8 1 3 0.5913 1
700000 0 700000 4 507 2 5 1 1 0.93938 1 0.00314
1440000 1554769 1440000 6 343 1 5 0.79362 1
2295000 0 2295000 6 547 2 3 0 1 0.96455 1 0.6775
5000000 4555817 10000000 6 337 1 2 0.71691 1
5000000 935319.3 10000000 6 133 1 1 1 0 0.98871 1 0.00022
5000000 4451327 10000000 5 41 1 4 1 1 0.5598 1 0.16949
350000 0 350000 28 1624 2 5 1 0 0.88946 1 0.01681
2500000 0 2555555 30 1226 1 4 1 1 0.9142 1 0.00619
4483999 0 8483999 30 724 1 3 1 1 0.95918 1 0.00231
4875000 0 6500000 30 701 2 1 1 0 0.9143 1 0.01356
2388661 0 5000000 30 1570 2 2 1 0 0.9701 1 0.00137
160000 0 320000 28 1381 2 3 1 1 0.88469 1 0.01055
192000 0 320000 28 871 1 4 1 1 0.81705 1 0.01093
384000 0 384000 28 119 2 2 1 0 0.49785 0 0.09249
4133285 0 2388661 21 197 1 5 1 1 0.76035 1 0.06145
192000 0 320000 28 666 2 2 1 0 0.7225 1 0.02047
192000 0 320000 28 989 1 3 1 1 0.8693 1 0.00592
192000 0 320000 28 806 2 4 1 0 0.69571 1 0.02903
262515 0 640000 28 938 2 5 1 1 0.7221 1 0.04006
192000 0 320000 28 722 1 2 1 0 0.83348 1 0.00679
192000 0 320000 28 808 1 1 1 1 0.87567 1 0.00583
384000 1938.19 640000 28 694 2 5 0 0 0.62525 1 0.14573
192000 0 320000 28 1014 1 2 1 1 0.89361 1 0.00439
4050000 3861301 8500000 6 3376 2 1 0.9982 1
5000000 3751320 13600000 6 8 1 3 0.87784 1
2250000 2063929 6600000 6 77 2 4 0.79409 1
3600000 0 9200000 6 1408 2 5 1 0 0.99809 1 0.00001
5000000 4625600 15000000 6 39 1 5 1 1 0.75706 1 0.08535
384000 0 640000 23 682 1 1 1 1 0.91193 1 0.00234
384000 0 640000 23 682 2 4 1 1 0.77133 1 0.01109
5000000 3955074 17000000 6 72 1 5 1 1 0.91931 1 0.01389
4000000 3844555 9600000 67 314 2 1 1 0 0.00495 0 7.26961
1500000 0 2700000 28 1353 1 3 1 0 0.95703 1 0.00141
1500000 0 2700000 30 633 2 4 1 1 0.70471 1 0.02986
1000000 0 2100000 27 938 1 5 1 0 0.87719 1 0.00676
192000 0 320000 23 967 2 1 1 1 0.90147 1 0.00463
192000 0 320000 23 825 1 4 1 1 0.87534 1 0.00428
2520000 0 2520000 6 961 1 2 1 1 0.99205 1 0.00007
2520000 0 2520000 6 377 1 1 1 0 0.98167 1 0.00033
300000 0 300000 23 388 2 4 1 0 0.64725 1 0.02172
200000 0 200000 18 861 2 5 1 0 0.85446 1 0.01014
200000 0 200000 21 919 1 4 1 1 0.90853 1 0.00291
1000000 0 1000000 23 681 1 3 0.88221 1
945857 1116760 823000 23 314 2 1 0 1 0.49896 0 0.11353
900523 804888.1 922000 23 314 1 4 0 1 0.58257 1 0.08205
850000 25167.92 850000 23 69 2 5 0 0 0.48487 0 0.08066
450000 0 450000 23 540 2 3 1 0 0.74957 1 0.00872
382000 0 3820000 27 938 1 5 1 1 0.92287 1 0.00451
368600 237983.6 350000 23 38 2 3 0 0 0.48509 0 0.06221
282235 0 445000 23 178 1 5 1 0 0.65499 1 0.04302
248500 0 248500 23 279 2 3 1 1 0.64388 1 0.02117
240000 0 240000 21 559 1 1 1 0 0.90318 1 0.00261
800000 0 800000 23 176 2 2 1 1 0.67084 1 0.02651
463500 0 463500 23 315 1 5 1 1 0.70607 1 0.026
2553610 0 2553610 6 364 1 4 1 0 0.96762 1 0.00078
2722619 0 2722619 6 532 2 3 1 0 0.96658 1 0.0009
600000 0 600000 23 287 1 2 1 1 0.80565 1 0.00733
1000000 0 1000000 24 906 2 4 1 0 0.82667 1 0.00954
1268000 1314449 1127725 23 389 1 1 0 0 0.63218 1 0.19487
763430 922383.4 702475 23 682 2 2 1 1 0.65434 1 0.04021
350000 0 350000 18 938 1 3 0 1 0.94633 1 0.32025
362140 263843.6 350000 23 246 2 4 0 1 0.52346 1 0.05726
191170 167574.7 195000 23 296 2 5 1 1 0.51538 1 0.07663
840000 0 840000 23 223 1 1 1 0 0.82447 1 0.01057
441000 0 441000 18 902 2 2 1 0 0.92125 1 0.00232
500000 0 500000 21 744 1 3 1 1 0.90345 1 0.00182
240000 0 240000 21 212 1 3 1 0 0.77551 1 0.0094
240000 0 240000 21 212 2 2 1 1 0.70567 1 0.01943
240000 0 240000 21 510 1 3 1 0 0.85421 1 0.00305
4700000 0 4700000 21 633 2 4 1 1 0.89644 1 0.01122
2400000 2938.74 2400000 23 273 1 1 0.87575 1
300000 0 300000 19 746 2 5 1 1 0.81455 1 0.01325
445000 0 445000 23 531 1 3 1 0 0.83667 1 0.00355
310000 0 310000 21 531 2 4 1 0 0.74577 1 0.01148
500000 0 500000 21 996 1 5 1 0 0.90927 1 0.00422
217200 0 382000 234 206 2 2 1 0 0 0 78.40688
500000 0 500000 0 701 1 1 1 1 0.99169 1 0.00009
660000 0 660000 21 834 2 4 1 0 0.84359 1 0.00681
13500000 0 13500000 19 967 1 3 1 1 0.99599 1 0.00015
382000 0 282000 23 162 2 5 1 0 0.50165 1 0.08282
400000 0 400000 19 765 1 3 1 0 0.92176 1 0.00142
254000 377459.3 254000 23 891 1 2 0 1 0.88436 1 0.24587
346940 0 454000 23 100 2 4 1 1 0.53382 1 0.05824
368500 0 368500 23 6 1 2 1 0 0.70606 1 0.02929
599850 0 599850 30 633 1 1 1 1 0.81408 1 0.01361
1000000 51512.48 1000000 23 101 2 5 0 0 0.49999 0 0.07952
467360 564662 450000 23 891 1 3 0 1 0.84452 1 0.22968
5000000 3436988 5000000 6 69 2 2 1 0 0.41906 0 0.58271
1080000 305370.9 1080000 23 9 1 4 0 1 0.58326 1 0.10135
392000 0 392000 27 688 2 3 1 0 0.71494 1 0.01682
413000 0 413000 23 526 1 4 1 1 0.80677 1 0.00647
310000 0 310000 18 935 2 3 1 0 0.90904 1 0.00302
279930 319602.3 310000 23 519 1 4 0 1 0.74454 1 0.11165
354570 354900.3 321000 23 322 2 3 0 0 0.57855 1 0.054
300000 0 300000 18 765 1 2 1 1 0.93946 1 0.00102
236000 0 236000 23 449 2 1 1 1 0.78061 1 0.01442
300000 0 300000 19 800 1 3 1 0 0.92471 1 0.00143
996900 1100884 823000 23 195 1 4 0 0 0.44638 0 0.07141
2510096 0 1032000 23 287 2 5 1 1 0.57107 1 0.08519
300000 292.55 300000 21 393 2 3 0 1 0.73464 1 0.07573
1153298 789.49 1200000 21 150 1 2 0 1 0.8203 1 0.17397
1030000 0 1030000 18 760 2 2 1 1 0.91122 1 0.00245
289940 36459.25 500000 21 223 1 5 0 1 0.71333 1 0.17441
480000 0 480000 21 519 2 1 1 1 0.83984 1 0.00773
240000 0 240000 21 212 1 2 1 1 0.80671 1 0.00823
240000 0 240000 21 212 1 5 1 0 0.70298 1 0.03074
240000 0 240000 21 490 1 3 1 0 0.84974 1 0.00322
465000 0 465000 23 56 2 4 1 0 0.51307 1 0.06568
148870 0 450000 23 434 2 3 1 0 0.71695 1 0.01254
4000000 0 4000000 19 877 1 2 1 0 0.97323 1 0.00076
812000 51070.1 812000 23 55 2 3 0 0 0.56425 1 0.07641
463500 0 463500 23 197 1 5 1 0 0.66087 1 0.03849
940905 0 940905 23 213 2 2 1 0 0.69136 1 0.02211
240000 0 240000 21 212 1 2 1 1 0.80671 1 0.00823
480000 0 480000 21 212 2 1 1 0 0.75265 1 0.02091
240000 0 240000 21 510 2 4 1 1 0.73587 1 0.01264
240000 0 240000 21 492 2 5 1 0 0.69074 1 0.02775
72000 0 72000 4 441 1 3 1 1 0.96854 1 0.00074
2934000 0 2934000 6 532 2 2 1 0 0.97333 1 0.00067
480000 0 480000 21 212 1 5 1 0 0.71317 1 0.02661
5000000 0 5000000 21 883 1 4 1 1 0.96148 1 0.00195
400000 0 400000 21 175 2 2 1 1 0.69883 1 0.02121
286000 0 286000 18 864 2 1 1 1 0.92753 1 0.0024
2750000 0 2750000 6 427 1 5 1 1 0.96644 1 0.00096
400000 0 400000 21 393 2 3 1 1 0.73868 1 0.00929
1500000 0 1500000 4 526 1 2 1 1 0.98304 1 0.00024
650000 0 650000 23 567 1 2 1 1 0.8731 1 0.00281
4725000 0 4725000 6 756 2 4 1 0 0.98172 1 0.00051
4725000 0 4725000 6 756 1 2 1 0 0.99273 1 0.00009
4725000 10011.88 4725000 6 756 2 2 0.98728 1
4725000 0 4725000 6 756 1 2 1 0 0.99273 1 0.00009
4725000 0 4725000 6 756 2 5 1 1 0.978 1 0.00075
2520000 0 2520000 6 961 2 1 1 0 0.98859 1 0.00017
2520000 0 2520000 6 377 1 4 1 1 0.96812 1 0.00075
300000 0 300000 23 388 1 5 1 0 0.72556 1 0.02257
200000 0 200000 18 861 1 4 1 1 0.92508 1 0.00202
200000 0 200000 21 919 2 2 1 0 0.8928 1 0.00394
1000000 0 1000000 23 681 1 1 1 0 0.91619 1 0.00236
945850 1116760 823000 23 314 2 1 0 1 0.49896 0 0.11353
900520 804888.1 922000 23 314 2 1 0 0 0.58571 1 0.12401
850000 25167.92 850000 23 69 1 3 0 1 0.70515 1 0.12134
450000 0 450000 23 540 2 4 1 0 0.71244 1 0.01396
382000 0 382000 27 938 1 2 1 1 0.8913 1 0.00388
368610 237983.6 350000 23 38 2 3 0 0 0.48509 0 0.06221
382230 0 445000 23 178 1 4 1 1 0.69491 1 0.02206
248500 0 248500 23 279 2 5 1 0 0.55332 1 0.06149
240000 0 240000 21 559 1 2 1 0 0.88534 1 0.00233
800000 0 800000 23 176 2 3 1 1 0.62782 1 0.02569
463500 0 463500 23 315 1 1 1 1 0.83654 1 0.008
2553610 0 2553610 6 364 1 5 1 0 0.96114 1 0.00122
2722600 0 2722619 6 532 2 2 1 1 0.97218 1 0.00069
600000 0 600000 23 287 1 3 1 0 0.77433 1 0.00773
1000000 0 1000000 24 906 1 4 1 1 0.8925 1 0.00357
1268000 1314449 1127725 23 389 2 5 0 1 0.31668 0 0.05142
763440 922383.6 702475 23 682 1 5 0 0 0.65141 1 0.20369
350000 0 350000 18 938 2 2 1 1 0.92446 1 0.00229
362149 263843.6 350000 23 246 1 4 0 1 0.65661 1 0.09467
191100 167574.7 195000 23 296 2 3 0 1 0.60819 1 0.06204
840000 0 840000 23 223 1 3 1 0 0.76292 1 0.00959
441000 0 441000 18 902 2 1 1 1 0.93392 1 0.00212
500000 0 500000 21 744 1 1 1 0 0.93178 1 0.00155
240000 0 240000 21 212 1 4 1 1 0.74089 1 0.01545
240000 0 240000 21 212 2 5 1 0 0.57621 1 0.05803
240000 0 240000 21 510 1 2 1 1 0.87622 1 0.00265
4700000 0 4700000 21 633 1 3 1 0 0.94793 1 0.00294
5529600 242895 5529603 23 160 2 5 0 1 0.69805 1 0.75065
125700 0 193203 18 430 2 3 1 0 0.80018 1 0.00699
480000 0 480000 21 212 1 2 1 1 0.81428 1 0.00741
300000 0 300000 19 332 2 4 1 0 0.71809 1 0.01503
4200000 0 11200000 4 758 1 1 1 1 0.99921 1 0
1800000 0 3000000 23 266 2 3 1 1 0.77164 1 0.01396
560000 0 560000 4 55 1 1 1 1 0.96162 1 0.00132
5084000 0 7000000 16 51 1 5 1 1 0.93285 1 0.00559
6355000 0 7000000 16 427 2 3 1 0 0.95406 1 0.00417
4500000 0 4500000 22 160 1 2 1 0 0.89132 1 0.01355
5000000 0 7000000 16 713 2 1 1 0 0.98224 1 0.00056
5080000 3279922 5080000 16 44 1 5 0 0 0.21849 0 0.08556
436000 0 436000 23 162 1 3 1 1 0.72664 1 0.01522
350000 249266.7 350000 23 100 2 4 0 1 0.46279 0 0.05757
760000 0 760000 4 91 2 2 1 1 0.92974 1 0.0037
250000 0 250000 23 491 1 1 1 0 0.86999 1 0.00458
4600000 0 10000000 23 175 1 1 1 0 0.9763 1 0.00119
5000000 361289.7 5000000 23 273 2 4 0 0 0.73051 1 0.57461
2450000 0 3500000 23 17 1 3 1 0 0.80576 1 0.01663
1000000 0 10800000 23 440 2 1 1 1 0.98377 1 0.00098
500000 0 500000 23 521 2 5 1 1 0.66702 1 0.0302
800000 0 800000 23 293 1 1 1 0 0.84061 1 0.00805
450000 563219.8 450000 23 273 2 3 0 1 0.51125 1 0.04975
8000000 0 22000000 23 246 1 1 1 0 0.999 1 0.00001
4500000 0 22000000 23 62 2 4 1 1 0.99667 1 0.00011
4500000 0 18000000 23 356 1 5 1 0 0.99589 1 0.0001
75000000 0 1E+08 21 365 2 3 1 1 1 1 0
150000 0 150000 4 55 1 4 1 1 0.92889 1 0.00425
187300 0 200000 4 562 2 5 1 0 0.93914 1 0.00342
400000 0 400000 0 562 1 2 1 1 0.98694 1 0.00018
50000 0 50000 0 542 1 3 1 0 0.98252 1 0.00033
1000000 0 2000000 23 540 1 4 1 1 0.86406 1 0.00332
5080000 0 5080000 16 371 2 5 1 0 0.89252 1 0.01352
4132335 3987054 4132335 16 14 1 5 0 1 0.09553 0 0.01992
660000 0 660000 21 834 1 3 1 1 0.91898 1 0.00157
DATASET ACTIVATE DataSet2.
GET
FILE='C:\Users\Dr. Faith Adebisi\Documents\PETER SPSS.sav'.
DATASET NAME DataSet5 WINDOW=FRONT.
GET
FILE='C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU
SPSS. 23.sav'.
DATASET NAME DataSet6 WINDOW=FRONT.
GRAPH
/HISTOGRAM=PRE_1
/PANEL COLVAR=Default COLOP=CROSS.
CROSSTABS
/TABLES=Default BY validate
/FORMAT=AVALUE TABLES
/CELLS=COUNT ROW COLUMN
/COUNT ROUND CELL.
Crosstabs
Cases
Valid Missing Total
N Percent N Percent N Percent
previously defaulted *
277 92.3% 23 7.7% 300 100.0%
validate
previously defaulted * validate Crosstabulation
Validate Total
0 1
Count 18 27 45
% within previously
Yes 40.0% 60.0% 100.0%
defaulted
% within validate 13.8% 18.4% 16.2%

Count 112 120 232
% within previously
No 48.3% 51.7% 100.0%
defaulted
% within validate 86.2% 81.6% 83.8%

Count 130 147 277
% within previously
Total 46.9% 53.1% 100.0%
defaulted
% within validate 100.0% 100.0% 100.0%
ROC Curve
previously defaulted Valid N

(listwise)
a
Positive 232
Negative 45
Missing 23
Larger values of the test result

variable(s) indicate stronger evidence for
a positive actual state.
a. The positive actual state is No.
Area Under the Curve
Test Result Variable(s): Predicted probability
a b
Area Std. Error Asymptotic Sig. Asymptotic 95% Confidence
Interval
Lower Bound Upper Bound
.765 .040 .000 .687 .844
The test result variable(s): Predicted probability has at least one tie between the
positive actual state group and the negative actual state group. Statistics may be
biased.
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5
LOGISTIC REGRESSION VARIABLES Default

/SELECT=validate EQ 1
/METHOD=FSTEP(LR) Edlev Loan Balance Collateral Interest Days Gender
/SAVE=PRED PGROUP COOK
/CLASSPLOT
/PRINT=GOODFIT CI(95)
/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5)
Logistic Regression
[DataSet1] C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU
SPSS. 2.sav
a
Total 147 49.0

Total 300 100.0

cases.
Yes 0
No 1
a,b
Observed Predicted
c d,e
previously defaulted Percentage previously defaulted Percent
Yes No Correct Yes No age
Correct
Yes 0 27 .0 0 18 .0
Step 0 No 0 120 100.0 0 112 100.0
a. Constant is included in the model.

b. The cut value is .500
c. Selected cases validate EQ 1
d. Unselected cases validate NE 1
e. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables
with values out of the range of the selected cases.

Step 0 Constant 1.492 .213 49.042 1 .000 4.444
a
Score df Sig.
Edlev 1.730 1 .188
Loan .253 1 .615
Balance 3.874 1 .049
Step 0 Variables Collateral .438 1 .508
Interest 2.470 1 .116
Days 5.338 1 .021
Gender 1.269 1 .260

Block 1: Method = Forward Stepwise (Likelihood Ratio)
Chi-square df Sig.
Step 5.736 1 .017
Step 1 Block 5.736 1 .017
Model 5.736 1 .017

Step 6.148 1 .013
Step 2 Block 11.884 2 .003
Model 11.884 2 .003
Step 4.332 1 .037
Step 3 Block 16.216 3 .001
Model 16.216 3 .001
Model Summary

Square Square
a
1 134.478 .038 .062
a
2 128.330 .078 .126
b
3 123.998 .104 .170
a. Estimation terminated at iteration number 5 because parameter

estimates changed by less than .001.
b. Estimation terminated at iteration number 6 because parameter
Step Chi-square df Sig.
1 14.629 7 .041
2 8.280 8 .407
3 16.697 8 .033
Contingency Table for Hosmer and Lemeshow Test

previously defaulted = Yes previously defaulted = No Total
Observed Expected Observed Expected
1 5 4.483 10 10.517 15
2 3 4.217 12 10.783 15
3 2 3.976 14 12.024 16
4 8 3.500 8 12.500 16
Step 1 5 2 2.923 14 13.077 16
6 2 2.500 14 13.500 16
7 1 1.938 14 13.062 15
8 0 1.578 15 13.422 15
9 4 1.885 19 21.115 23
1 6 5.978 9 9.022 15
2 5 4.557 10 10.443 15
3 5 4.024 10 10.976 15
4 3 3.208 13 12.792 16
5 2 2.670 13 12.330 15
Step 2
6 1 2.341 15 13.659 16
7 2 1.621 13 13.379 15
8 0 1.291 15 13.709 15
9 3 .934 12 14.066 15
10 0 .375 10 9.625 10
1 8 6.521 7 8.479 15
2 7 5.065 9 10.935 16
3 2 3.649 12 10.351 14
4 3 3.375 12 11.625 15
5 0 2.620 15 12.380 15
Step 3
6 3 2.096 12 12.904 15
7 1 1.651 14 13.349 15
8 0 1.142 15 13.858 15
9 3 .664 12 14.336 15
10 0 .216 12 11.784 12
Observed Predicted
b
Selected Cases Unselected Casesc,d
previously defaulted Percentage Correct previously defaulted Percentage
Yes No Yes No Correct

Yes 0 27 .0 0 18 .0
Step 1 No 0 120 100.0 0 112 100.0

Yes 0 27 .0 0 18 .0
Step 2 No 1 119 99.2 3 109 97.3
Yes 1 26 3.7 1 17 5.6
Step 3 No 1 119 99.2 3 109 97.3

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with values out of the
range of the selected cases.
Lower Upper
Days .002 .001 5.118 1 .024 1.002 1.000 1.003

a
Step 1
Constant .822 .337 5.947 1 .015 2.274
Interest -.068 .029 5.346 1 .021 .934 .882 .990
b
Step 2 Days .002 .001 7.869 1 .005 1.002 1.001 1.004
Constant 1.815 .579 9.833 1 .002 6.143
Balance .000 .000 4.216 1 .040 1.000 1.000 1.000
Interest -.106 .040 7.147 1 .008 .900 .832 .972

c
Step 3
Days .002 .001 4.888 1 .027 1.002 1.000 1.003
Constant 2.966 .901 10.827 1 .001 19.415
a. Variable(s) entered on step 1: Days.

b. Variable(s) entered on step 2: Interest.
c. Variable(s) entered on step 3: Balance.
Model if Term Removed
Variable Model Log Change in -2 Log df Sig. of the

Likelihood Likelihood Change
Step 1 Days -70.107 5.736 1 .017

Interest -67.239 6.148 1 .013
Step 2
Days -68.803 9.276 1 .002
Step 3 Balance -64.165 4.332 1 .037
Interest -66.917 9.836 1 .002
Days -64.737 5.475 1 .019
a
Score df Sig.
Edlev 1.132 1 .287
Loan .447 1 .504
Balance .668 1 .414

Step 1 Variables
Collateral 1.068 1 .301
Interest 5.649 1 .017
Gender 1.052 1 .305

Edlev .877 1 .349
Loan .332 1 .565
Step 2 Variables Balance 4.697 1 .030
Collateral .681 1 .409
Gender .956 1 .328
Edlev .434 1 .510
Loan .524 1 .469

Step 3 Variables
Collateral 1.399 1 .237
Gender 1.656 1 .198

Step number: 1
Observed Groups and Predicted Probabilities
16 +
N +
I
N I
I
N I
F I
N I
R 12 +
N N +
E I
N NN I
Q I N
N NN I
U I N
N NN I
E 8 + N N
N N N NN +
N I N N N
N N N NN I
C I NNNN N
N N NN NN I
Y I NNNN NN
NN N NNNNN I
4 + NNNN
NNNNNN NN NNNNNN +
I YNNN
NNNYNYNNNNN NNNNYN I
I
YYYNNYNNYNYNNNNYNNNNNYN I
I
YYYYNYNYYYYYYNNYNNYNNYYNNNNN I
Predicted ---------+---------+---------+---------+---------+---------+---------+-------
--+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7
.8 .9 1
Group:
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNN
Predicted Probability is of Membership for No

The Cut Value is .50
Symbols: Y - Yes
N - No
Each Symbol Represents 1 Case.
Step number: 2
16 +
+
I
I
I
I
F I
I
R 12 +
N +
E I
N I
Q I N
N I
U I N
N N I
E 8 + N
NN N N +
N I NN
NN NN NN N I
C I NN
NN NN NN N I
Y I NN
NNN NN NNNNNN N I
4 + N NY
N NNN NN NNNNNNN N +
I N N YNNY
NN NNNNNN NNNNNNY NN I
I Y N N NN YNYY
NN YYNNNNNNNNYNNY NN I
I N N Y Y Y YN YYNYY N
YNY NYYNNNNNYNNYNNYNNNNN I
Predicted ---------+---------+---------+---------+---------+---------+---------+-------
--+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7
.8 .9 1
Group:
NNNNNNNNNNNNN

Symbols: Y - Yes
N - No
Step number: 3
16 +
+
I
I
I
I
F I
I
R 12 +
+
E I
I
Q I
N I
U I
N N I
E 8 +
N N +
N I N
N N N I
C I NN
N N NN N N N I
Y I NNN
N NN NN NN N N N I
4 + N NNN
NN NNNNN NN N N N +
I N NY N NN NNN
N NN NNNNNN NN NNNN N I
I N N Y YYNNY YNNNNNN
NNNNNNNNNNN NN NNNNNN I
I Y NYN Y YYYYY
YNNYYNYNNYNYNNNNYYYYNNNNYYNYNNNI
Predicted ---------+---------+---------+---------+---------+---------+---------+-------
--+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7
.8 .9 1
Group:
NNNNNNNNNNNNN

Symbols: Y - Yes
N - No
Logistic Regression
[DataSet6] C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU
SPSS. 23.sav

a
Total 147 49.0

Total 300 100.0

cases.
Yes 0
No 1
a,b
Observed Predicted
c d,e
Yes 0 27 .0 0 18 .0
Step 0 No 0 120 100.0 0 112 100.0
a. Constant is included in the model.

b. The cut value is .500
c. Selected cases validate EQ 1
d. Unselected cases validate NE 1
e. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with
values out of the range of the selected cases.
Step 0 Constant 1.492 .213 49.042 1 .000 4.444
a
Score df Sig.
Edlev 1.730 1 .188
Loan .253 1 .615
Balance 3.874 1 .049
Step 0 Variables Collateral .438 1 .508
Interest 2.470 1 .116
Days 5.338 1 .021
Gender 1.269 1 .260
Block 1: Method = Enter
Chi-square df Sig.
Step 24.869 7 .001
Step 1 Block 24.869 7 .001
Model 24.869 7 .001
Model Summary

Square Square
a
1 115.345 .156 .253
a. Estimation terminated at iteration number 7 because parameter
Step Chi-square df Sig.
1 9.059 8 .337
Contingency Table for Hosmer and Lemeshow Test
previously defaulted = Yes previously defaulted = No Total
Observed Expected Observed Expected
1 9 7.888 6 7.112 15
2 4 5.140 11 9.860 15
3 6 4.135 9 10.865 15
4 0 3.153 15 11.847 15
5 3 2.432 12 12.568 15
Step 1
6 1 1.683 14 13.317 15
7 1 1.239 14 13.761 15
8 2 .852 13 14.148 15
9 1 .396 14 14.604 15
10 0 .082 12 11.918 12
a
Observed Predicted
b c,d
Step 1 previously defaulted Yes 5 22 18.5 6 12 33.3

No 1 119 99.2 5 107 95.5

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with
values out of the range of the selected cases.
Lower Upper
Edlev -.189 .187 1.017 1 .313 .828 .573 1.195
Loan .000 .000 .051 1 .821 1.000 1.000 1.000
Balance .000 .000 6.861 1 .009 1.000 1.000 1.000
Collateral .000 .000 2.430 1 .119 1.000 1.000 1.000

a
Step 1
Interest -.107 .042 6.625 1 .010 .899 .828 .975
Days .002 .001 4.468 1 .035 1.002 1.000 1.003
Gender -.554 .475 1.364 1 .243 .574 .227 1.456
Constant 4.179 1.343 9.688 1 .002 65.328
a. Variable(s) entered on step 1: Edlev, Loan, Balance, Collateral, Interest, Days, Gender.
Step number: 1
16 +
+
I
I
I
I
F I
I
R 12 +
+
E I
I
Q I
I
U I
N NI
E 8 +
N N N+
N I
N N N N NI
C I
N NNNN NNNNI
Y I N
N N NNNNN NNNNI
4 + NN N
NN N NNNNN NNNN+
I NN NN
N NN N NN NNNNNN NNNNI
I Y N Y Y N N NYNNNN
NN NN NN NNNNNNNNN NNNNI
I Y Y N Y YNYYN N NY YNN NYNN
YYYYYYNNNNNNYYNYNNNYNNYNYYNYNNNI
Predicted ---------+---------+---------+---------+---------+---------+---------+------
---+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7
.8 .9 1
Group:
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNN

Symbols: Y - Yes
N - No

Logistic Regression Model PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Logistic Regression Model PDF

Încărcat de

Drepturi de autor:

Formate disponibile

A PROJECT

APPLICATION OF LOGISTIC REGRESSION MODEL IN PREDICTING

(A Case Study of Fidelity Bank PLC)

DAVID PETER ILESANMI

2.5 The Minimum Variance Method15

2.6 Binary and Multinomial Logistic Regression...15

CHAPTER THREE ................................................................................................... ..17

3.5 Choosing a Significant Model.............................................................19

3.6 Loss Function.....................................................................................22

3.7 ROC Curve.........................................................................................23

CHAPTER FOUR .................................................................................................. .24

1.5 Significance of the Study

2.0 Banks Failure

2.1 Predicting Default of Enterprises

2.2 Credit Risk

Types of Credit Risk

Variance of export revenue

Domestic money supply growth

Mitigating credit risk

Periodically report its financial condition

Refrain from paying dividends, repurchasing shares, borrowing further, or

W(t) = dW(t)/dt (1)

Where denotes the upper limit or saturation level of W,its asymptote as

2.4 The use of Logistic Regression over Linear Least

2.5 The Minimum Variance Method

2.6 Binary and multinomial Logistics Regression

Binary is a type of logistic regression model which is use for categorical

Hosmer-Lemeshow test of goodness of fit for the model

Numerical variable: in other to introduce the variable in the model, it must

3.5 Choosing a Significant Model

Where V is the number of degrees of freedom, usually given by N-n-1, where N

4.2 Data Analysis

4.3 Descriptive Analysis

N Minimum Maximum Mean Std. Deviation

Loan 300 46000 75000000 1845374.49 4640069.418

Yes 45 15.0 16.2 16.2

Valid No 232 77.3 83.8 100.0

Total 277 92.3 100.0

Male 163 54.3 54.3 54.3

Valid Female 137 45.7 45.7 100.0

Total 300 100.0 100.0

Primary 54 18.0 18.0 18.0

Secondary 61 20.3 20.3 38.3

OND 67 22.3 22.3 60.7

Total 300 100.0 100.0

4.4 Logistic Regression Analysis

Block 0: Beginning Block

Case Processing Summary

Included in Analysis 147 49.0

Selected Cases Missing Cases 0 .0

Total 147 49.0

a. If weight is in effect, see classification table for the total number of

1 represents not defaulting (non defaulter) while 0 represents defaulting

Block 0: Beginning Block

Categorical Variables Codings

Frequency Parameter coding

(1) (2) (3) (4)

Primary 26 1.000 .000 .000 .000

Secondary 34 .000 1.000 .000 .000

OND 27 .000 .000 1.000 .000

Edlev 1.730 1 .188

Loan .253 1 .615

Balance 3.874 1 .049