Documente Academic
Documente Profesional
Documente Cultură
Group - 01
2018022 Kanishk Naik
2018039 Rohan S. Naik
2018040 Rohan Thakur
2018173 Renita Coutinho
2018083 Hari Sankar
2018222 Nirmalkumar Rathi
Table of Contents
1. Introduction………………………………………………………………………………………………3
2. Literature Review........................................... Error! Bookmark not defined.
3. Identification & Removal of Outliers ............................................................ 8
4. Identification & Handling of Missing Values............................................... 14
5. Populating Correlation Matrix ................................................................... 15
6. Reviewing Predictor Variables ................................................................... 16
7. Communicating Categorical Variables to Azure ......................................... 18
8. Developing Classification Models .............................................................. 20
9. Filter Based Feature Selection ................................................................... 24
10. Business Interpretation……………………………………………………………………………29
11. References……………………………………………………………………………………………….30
1. Introduction
The main motive of this report is to find out which variables affect whether the credit rating of the
customer is good or not. To analyse the credit rating we have used Logistic Regression, Boosted
Decision tree and K-means Clustering algorithm.
Tool that is used for forecasting financial risk in consumer lending is credit scoring. Credit scoring is a
statistical analysis performed by lenders and financial institutions to access a person's
creditworthiness. Credit scoring is used by lenders to help decide on whether to extend or deny credit.
A person's credit score is a number between 300 and 850, 850 being the highest credit rating possible.
A credit score can impact many financial transactions including mortgages, auto loans, credit cards,
and private loans.
A credit score is generally influenced by five categories: payment history, types of credit, new credit,
current debt and length of credit. A person needs to pay special attention to current debt and payment
history. Although credit scoring ranks a borrower's credit riskiness, it does not provide an estimate of
a borrower's default probability. As an ordinal ranking, it only assesses a borrower's riskiness from
highest to lowest. As such, credit scoring suffers from its inability to determine whether Borrower A is
twice as risky as Borrower B. This is where analytics helps.
Here we develop a predictive model to analyse the credit score and get prior knowledge about who is
good credit risk or a bad credit risk so that the concerned authorities can avoid getting into financial
crunch situations in the future.
2. Perform literature survey and identify the variables from the research papers.
Compare these variables with the variables given in the data set and see what the
difference is.
Measuring Credit Risk of Bank Customers Using Artificial Neural Network by Mohsen Nazari
Department of Business Management, Faculty of Management, University of Tehran &
Mojtaba Alidadi (Corresponding author) Department of Business Management, Faculty of
Management, University of Tehran, January 2012
This paper discusses about the classification criteria for identification good customers and
bad customers in Iranian banks. Artificial neural network technique is used for credit risk
measurement and the calculations have been done by using SPSS and MATLAB software. Also,
the factors affecting the credit risk are divided into two groups: within the organization and
outside the organization and studied accordingly.
Factors Affecting Credit Risk: An Empirical Study of the Jordanian Commercial Banks by Dr.
Abedalfattah Zuhair Al-abedallat Faculty of Business and Finance, The World Islamic
Sciences & Education University, Amman, Jorda, December 2016
This paper discusses about the various factors that are affecting credit risk. It also highlights
statistical impact of those factors on the credit risk and provides recommendation on the
same. It uses linear regression to highlight the impact at a significance level of 0.05.
The 5 Biggest Factors That Affect Your Credit by By Amy Fontinelle, June 25, 2019
This article specifies and explains the five biggest things that affect credit score, how they can
affect credit risk, and what it means when we apply for a loan. It also awards importance in
the form of percentage so as to specify which the most important factor is and which the least
important factor is. It also tells us about what is counted in the credit score calculation and
what is not to be counted.
Key Factors Influencing Credit Risk of Islamic Bank: A Malaysian Case by Nor Hayati Ahmad
And Shahrul Nizam Ahmad Faculty Of Banking And Finance University Utara Malaysia,
January 2004
This paper examines the factors affecting credit risk, the main risk being faced by banking
institutions and systematically identifies the key factors influencing credit risk formation in
Islamic banking operations in Malaysia. This paper also identifies whether there exists any
difference between credit risk determinants of Islamic banking and conventional banks in
Malaysia.
Determinants of Bank Credit Risk: Empirical Evidence from Jordanian Commercial Banks by
Buthiena Kharabsheh, Yarmouk University, 2013
This paper investigated the credit risk determinants in Jordanian banking sector. Both bank-
specific variables and macro-economic variables were included in the analysis using a
balanced panel dataset of all Jordanian commercial banks over the period 2000-2017. The
findings revealed that, credit risk increased as bank capital ratio, operating inefficiency and
the growth rate in credit increased. Whereas, larger and more profitable banks faced lower
credit risk. However, no effect was found for bank liquidity. Further, the macroeconomic
variables included indicated that as unemployment rate increased, credit risk significantly
increased and similar positive effect was also documented for the crisis effect.
Factors Affecting the Bank Credit: An Empirical Study on the Jordanian Commercial Banks
by Mwafag Rabab, April 25, 2015
This paper examined the determinants of commercial banks' lending in Jordan. The study
sample consisted of ten Jordanian commercial banks during the period 2005-2013. The study
used the ratio of credit facilities to total assets as a dependent variable, and eleven
independent variables including the ratio of deposits, ratio of non-performing loans, capital
ratio, liquidity ratio, asset size, lending rate, deposits rate, window rate, legal reserve ratio,
inflation and economic growth rate . The results showed that the ratio of non-performing
loans, liquidity ratio and window rate have a negative and significant impact on the ratio of
credit facilities, while found that the bank size and the economic growth have a positive and
significant impact on the ratio of credit facilities granted by commercial banks in Jordan.
Credit Risk Management: Development over the last 20 years – Edward I Altman, Anthony
Saunders, 1997
About 20 years ago most financial institutions depended on subjective analysis or so-called
banker expert systems to evaluate the credit risk on loans. Bankers used information on
various borrower characteristics such as borrower character, capital, capacity and collateral,
the 4 Cs of credit, to reach judgement to whether or not to grant credit. But now financial
institutions have increasingly moved away from subjective/expert systems over the past 20
years towards systems that are more objectively based.
Accounting based credit-scoring systems: In accounting based credit-scoring systems, the
financial institutions compares various key accounting ratios of potential borrowers with
industry or group norms. The key accounting variables are combined and weighted to produce
either a credit risk score or a probability of default measure. If the credit risk score, or
probability, attains a value above a critical benchmark, a loan applicant is either rejected or
subjected to increased scrutiny.
There are four methodological approaches to developing credit-scoring systems: 1.the linear
probability model 2. the logit model, 3.the probit model and the discriminant analysis model.
Newer models of credit risk measurement: Newer approach is the application of neural
network analysis to the credit risk classification problem. Essentially, neural network analysis
is similar to non-linear discriminant analysis, in that it drops the assumption that variables
entering into the bankruptcy prediction function are linearly and independently related.
Specifically, neural network models of credit risk explore potentially hidden correlations
among the predictive variables which are then entered as additional explanatory variables in
the non-linear bankruptcy prediction function
Analysis of Credit Risk Measurement Models in the Evaluation of Credit Demands - Mehmet
Ali Canbolat, 2015
This paper tells us that Banks and advisers have begun to develop credit risk models in the
second half of the 1990’s and measurement of the potential loss according to identified levels
of privacy has been determined as the target. Approach based on risk measurement and
management, will be evaluated optimally by obtaining data from effective sources, affordable
pricing will be held, and the bank's capital structure will be preserved. Credit risk model is the
key factor in the preliminary determination of the probability of default and combined or
quantitative score of the debtor. Analytic Hierarchy Process (AHP) evaluates financial analysis
methods existing in the literature as methods of evaluation for loan demands and some
qualitative and quantitative factors of a company such as “subjective credit worthiness, status
of the sector operating and loan guarantees” together and expresses the result with an overall
credit score.
However, there is another research stating that using financial statement analysis techniques
together gives more effective results. Sevim & Canbolat have developed a model named
"Scoring Model", which requires analysis techniques (comparative financial statements
analysis, vertical analysis, ratio analysis and cash flow statements) to be used together. In
Scoring Model, in addition to ratio analysis and cash flow statements techniques, comparative
financial statements analysis and vertical analysis techniques, which were never used in
software models before, were also used. As a result of these techniques applied, financial
statements of the firm requesting loan are interpreted by the software and credit score of the
company is created based on these interpretations.
Based upon the evaluation of variables found from the literature review and from the dataset
given it is evident that certain factors like demographics, credit history, source of current
income, nature of job type, value of collateral, applicants family status- single/ married,
reason of credit are some of the factors that hold a high weightage in predicting the overall
credit rating. Literature Review has highlighted the two main factors being in common i.e.
Duration and Amount of Loan in determining credit risk. Customers must work on these
factors so as to ensure that they are granted a loan in times when they need it the most.
Also, the importance of these factors change as per change in their geographical domain. For
example, factors having high significance for Malaysian banks may not be highly influencing
Indonesian or European. Thus we cannot neglect even one factor while we build a predictive
model as it may cause vast difference in the outcome and hinder our model accuracy.
3. Identify the outliers in the dataset using SPSS and remove the outliers from the
dataset using Azure. Describe the workflow needed to identify and remove the
outliers
In SPSS, by identifying the continuous variables, and passing in SPSS -> Descriptive -> Statistics ->
explores the following results could be found out. The results suggest the upper and lower limit for
identifying the outliers.
DURATION
Stem width: 10
Each leaf: 2 case(s)
Stem width: 0
Each leaf: 7 case(s)
AGE
AGE Stem-and-Leaf Plot
Stem width: 10
Each leaf: 1 case(s)
To remove outliers:
Use clip values operation with upper and lower limit specified using the formula for 1 st and
3rd quartile. Duration has q1 = 12 and q3 = 24. Thus the values outside the range of lower
and upper limits will be clipped and replaced with threshold values. Variable Amount has
been clipped similarly.
4. Identify the missing values using Microsoft Azure and describe the work flow needed
to identify them and handle them. Remove the outliers from the dataset and perform
rest of the operations.
By using clip value function, initially we removed the outliers which were indicated as the missing
value in the column indicator. On which, by using the clear missing value -> delete entire row we
removed the outlier/missing entries from data and used it for subsequent working.
5. Populate the correlation matrix for the “numerical variables” and draw inferences.
Using the cleaned data, the 6 numerical variables are considered to identify the correlation among
them.
1. Duration
2. Amount
3. No. of credits
4. Age
5. Number of dependents
6. Instalment rate
The below screenshot indicates the procedure to be followed on Azure.
6. Review the predictor variables and guess from their definition as to what their role
might be in a credit decision.
After performing the initial analysis, by using filter based on feature selection, we found below 9
variables having role as predictor variables in the credit decision.
Predictor
Role in Credit Decision
Variables
CATEGORICAL VARIABLES:
CHK_ACCT
HISTORY
NEW_CAR
USED_CAR
FURNITURE
RADIO/TV
EDUCATION
RETRAINING
SAV_ACCT
EMPLOYMENT
MALE_DIV
MALE_SINGLE
MALE_MAR_WID
CO-APPLICANT
GUARANTOR
PRESENT_RESIDENT
REAL_ESTATE
PROP_UNKN_NONE
OTHER_INSTALL
RENT
OWN_RES
JOB
TELEPHONE
FOREIGN
RESPONSE/TARGET
The categorical variables in the German credit data set includes nominal variables with two
categories (binary) and more than two categories.
8. Divide the data randomly into training (60%) and testing (40%) partitions, and develop
classification models using the following data mining techniques: See which of the
models are predicting the outcome better and report. Provide the work flow.
u
a
c
t
Predicted
l
Negative Positive
Negative 147 85
Positive 307 333
Accuracy 55%
The predictive modelling that allows us to determine the credit rating i.e. good credit or bad credit
have direct business implications.
Banks and Non-banking financial institutions while loaning out money to people a credit rating
will help identify good creditors from bad creditors based on the above mentioned 30
variables, by doing so the rate of defaulting will reduce and the institutions lower their risk
and improve overall performance.
From the perspective by understating the factors that contribute to the overall credit rating
one can improve on certain aspects to improve their credit rating so as to enhance their
chance of getting a loan in their time of need
When a bank receives a loan application, based on applicant’s profile the bank has to make a decision
whether to accept it or reject it. The risks associated with the decision are as follows:
1. If the Applicant is likely to repay the loan, then not approving the loan to person results in loss of
business to the bank - (False Positive).
2. If the Applicant is not likely to repay the loan, then approving the loan to the person results in
financial loss to the bank – (False Negative).
It may be concluded that the second risk is a greater risk as compared to first as lending money to
fraud party have a higher amount of effect than not giving the credit. This model would greatly help
to evaluate or verify the decision of credit response of the company.
According to this model, 270 loan applicants (Both accept and reject) have been identified correctly
whether they are eligible or not. 52 loan applicants are at the risk of non-paying the amount for which
loan is approved and bank should recheck their application. While the 27 loan applicants which have
been rejected by the bank have the potential to pay back the loan to the bank thus losing out on some
amount of profit. Hence, the bank might be at the risk of the financial loss in the future.
Also, the clustering exercise is done to identify similar observations or group the data into clusters
which have similar characteristics. The current model has divided the data into two distinct and
exhaustive clusters. By studying the characteristics of each cluster, the company can categorize their
perspective clients or loan applicants and pre-determine if they will have a good score or bad score.
11. References
1. Measuring Credit Risk of Bank Customers Using Artificial Neural Network by Mohsen Nazari
Department of Business Management, Faculty of Management, University of Tehran &
Mojtaba Alidadi (Corresponding author) Department of Business Management, Faculty of
Management, University of Tehran, January 2012
2. Factors Affecting Credit Risk: An Empirical Study of the Jordanian Commercial Banks by Dr.
Abedalfattah Zuhair Al-abedallat Faculty of Business and Finance, The World Islamic Sciences
& Education University, Amman, Jorda, December 2016
3. The 5 Biggest Factors That Affect Your Credit by By Amy Fontinelle, Jun 25, 2019
4. Key Factors Influencing Credit Risk Of Islamic Bank: A Malaysian Case by Nor Hayati Ahmad
And Shahrul Nizam Ahmad Faculty Of Banking And Finance University Utara Malaysia, January
2004
5. Determinants of Bank Credit Risk: Empirical Evidence from Jordanian Commercial Banks by
Buthiena Kharabsheh, Yarmouk University, 2013
6. Factors Affecting the Bank Credit: An Empirical Study on the Jordanian Commercial Banks by
Mwafag Rabab, April 25, 2015
7. Credit Risk Management: Development over the last 20 years – Edward I Altman, Anthony
Saunders, 1997
8. Analysis of Credit Risk Measurement Models in the Evaluation of Credit Demands - Mehmet
Ali Canbolat, 2015
9. Credit Risk Management – Ken Brown, Peter Moles, January 2016
10. Seven Things that Impact your Credit Score By Preeti Motiani, Dec 16, 2017
Attached Files: