Sunteți pe pagina 1din 31

Project for Analytics

Group - 01
2018022 Kanishk Naik
2018039 Rohan S. Naik
2018040 Rohan Thakur
2018173 Renita Coutinho
2018083 Hari Sankar
2018222 Nirmalkumar Rathi
Table of Contents

1. Introduction………………………………………………………………………………………………3
2. Literature Review........................................... Error! Bookmark not defined.
3. Identification & Removal of Outliers ............................................................ 8
4. Identification & Handling of Missing Values............................................... 14
5. Populating Correlation Matrix ................................................................... 15
6. Reviewing Predictor Variables ................................................................... 16
7. Communicating Categorical Variables to Azure ......................................... 18
8. Developing Classification Models .............................................................. 20
9. Filter Based Feature Selection ................................................................... 24
10. Business Interpretation……………………………………………………………………………29
11. References……………………………………………………………………………………………….30
1. Introduction
The main motive of this report is to find out which variables affect whether the credit rating of the
customer is good or not. To analyse the credit rating we have used Logistic Regression, Boosted
Decision tree and K-means Clustering algorithm.
Tool that is used for forecasting financial risk in consumer lending is credit scoring. Credit scoring is a
statistical analysis performed by lenders and financial institutions to access a person's
creditworthiness. Credit scoring is used by lenders to help decide on whether to extend or deny credit.
A person's credit score is a number between 300 and 850, 850 being the highest credit rating possible.
A credit score can impact many financial transactions including mortgages, auto loans, credit cards,
and private loans.

A credit score is generally influenced by five categories: payment history, types of credit, new credit,
current debt and length of credit. A person needs to pay special attention to current debt and payment
history. Although credit scoring ranks a borrower's credit riskiness, it does not provide an estimate of
a borrower's default probability. As an ordinal ranking, it only assesses a borrower's riskiness from
highest to lowest. As such, credit scoring suffers from its inability to determine whether Borrower A is
twice as risky as Borrower B. This is where analytics helps.

Here we develop a predictive model to analyse the credit score and get prior knowledge about who is
good credit risk or a bad credit risk so that the concerned authorities can avoid getting into financial
crunch situations in the future.

2. Perform literature survey and identify the variables from the research papers.
Compare these variables with the variables given in the data set and see what the
difference is.

Measuring Credit Risk of Bank Customers Using Artificial Neural Network by Mohsen Nazari
Department of Business Management, Faculty of Management, University of Tehran &
Mojtaba Alidadi (Corresponding author) Department of Business Management, Faculty of
Management, University of Tehran, January 2012
This paper discusses about the classification criteria for identification good customers and
bad customers in Iranian banks. Artificial neural network technique is used for credit risk
measurement and the calculations have been done by using SPSS and MATLAB software. Also,
the factors affecting the credit risk are divided into two groups: within the organization and
outside the organization and studied accordingly.

Factors Affecting Credit Risk: An Empirical Study of the Jordanian Commercial Banks by Dr.
Abedalfattah Zuhair Al-abedallat Faculty of Business and Finance, The World Islamic
Sciences & Education University, Amman, Jorda, December 2016
This paper discusses about the various factors that are affecting credit risk. It also highlights
statistical impact of those factors on the credit risk and provides recommendation on the
same. It uses linear regression to highlight the impact at a significance level of 0.05.
The 5 Biggest Factors That Affect Your Credit by By Amy Fontinelle, June 25, 2019
This article specifies and explains the five biggest things that affect credit score, how they can
affect credit risk, and what it means when we apply for a loan. It also awards importance in
the form of percentage so as to specify which the most important factor is and which the least
important factor is. It also tells us about what is counted in the credit score calculation and
what is not to be counted.

Key Factors Influencing Credit Risk of Islamic Bank: A Malaysian Case by Nor Hayati Ahmad
And Shahrul Nizam Ahmad Faculty Of Banking And Finance University Utara Malaysia,
January 2004
This paper examines the factors affecting credit risk, the main risk being faced by banking
institutions and systematically identifies the key factors influencing credit risk formation in
Islamic banking operations in Malaysia. This paper also identifies whether there exists any
difference between credit risk determinants of Islamic banking and conventional banks in
Malaysia.

Determinants of Bank Credit Risk: Empirical Evidence from Jordanian Commercial Banks by
Buthiena Kharabsheh, Yarmouk University, 2013
This paper investigated the credit risk determinants in Jordanian banking sector. Both bank-
specific variables and macro-economic variables were included in the analysis using a
balanced panel dataset of all Jordanian commercial banks over the period 2000-2017. The
findings revealed that, credit risk increased as bank capital ratio, operating inefficiency and
the growth rate in credit increased. Whereas, larger and more profitable banks faced lower
credit risk. However, no effect was found for bank liquidity. Further, the macroeconomic
variables included indicated that as unemployment rate increased, credit risk significantly
increased and similar positive effect was also documented for the crisis effect.

Factors Affecting the Bank Credit: An Empirical Study on the Jordanian Commercial Banks
by Mwafag Rabab, April 25, 2015
This paper examined the determinants of commercial banks' lending in Jordan. The study
sample consisted of ten Jordanian commercial banks during the period 2005-2013. The study
used the ratio of credit facilities to total assets as a dependent variable, and eleven
independent variables including the ratio of deposits, ratio of non-performing loans, capital
ratio, liquidity ratio, asset size, lending rate, deposits rate, window rate, legal reserve ratio,
inflation and economic growth rate . The results showed that the ratio of non-performing
loans, liquidity ratio and window rate have a negative and significant impact on the ratio of
credit facilities, while found that the bank size and the economic growth have a positive and
significant impact on the ratio of credit facilities granted by commercial banks in Jordan.

Credit Risk Management: Development over the last 20 years – Edward I Altman, Anthony
Saunders, 1997
About 20 years ago most financial institutions depended on subjective analysis or so-called
banker expert systems to evaluate the credit risk on loans. Bankers used information on
various borrower characteristics such as borrower character, capital, capacity and collateral,
the 4 Cs of credit, to reach judgement to whether or not to grant credit. But now financial
institutions have increasingly moved away from subjective/expert systems over the past 20
years towards systems that are more objectively based.
Accounting based credit-scoring systems: In accounting based credit-scoring systems, the
financial institutions compares various key accounting ratios of potential borrowers with
industry or group norms. The key accounting variables are combined and weighted to produce
either a credit risk score or a probability of default measure. If the credit risk score, or
probability, attains a value above a critical benchmark, a loan applicant is either rejected or
subjected to increased scrutiny.
There are four methodological approaches to developing credit-scoring systems: 1.the linear
probability model 2. the logit model, 3.the probit model and the discriminant analysis model.
Newer models of credit risk measurement: Newer approach is the application of neural
network analysis to the credit risk classification problem. Essentially, neural network analysis
is similar to non-linear discriminant analysis, in that it drops the assumption that variables
entering into the bankruptcy prediction function are linearly and independently related.
Specifically, neural network models of credit risk explore potentially hidden correlations
among the predictive variables which are then entered as additional explanatory variables in
the non-linear bankruptcy prediction function

Analysis of Credit Risk Measurement Models in the Evaluation of Credit Demands - Mehmet
Ali Canbolat, 2015
This paper tells us that Banks and advisers have begun to develop credit risk models in the
second half of the 1990’s and measurement of the potential loss according to identified levels
of privacy has been determined as the target. Approach based on risk measurement and
management, will be evaluated optimally by obtaining data from effective sources, affordable
pricing will be held, and the bank's capital structure will be preserved. Credit risk model is the
key factor in the preliminary determination of the probability of default and combined or
quantitative score of the debtor. Analytic Hierarchy Process (AHP) evaluates financial analysis
methods existing in the literature as methods of evaluation for loan demands and some
qualitative and quantitative factors of a company such as “subjective credit worthiness, status
of the sector operating and loan guarantees” together and expresses the result with an overall
credit score.
However, there is another research stating that using financial statement analysis techniques
together gives more effective results. Sevim & Canbolat have developed a model named
"Scoring Model", which requires analysis techniques (comparative financial statements
analysis, vertical analysis, ratio analysis and cash flow statements) to be used together. In
Scoring Model, in addition to ratio analysis and cash flow statements techniques, comparative
financial statements analysis and vertical analysis techniques, which were never used in
software models before, were also used. As a result of these techniques applied, financial
statements of the firm requesting loan are interpreted by the software and credit score of the
company is created based on these interpretations.

Credit Risk Management – Ken Brown, Peter Moles, January 2016


There are a number of approaches to the evaluation of credit. We can categorise the list into
four categories, which are, to some extent, overlapping. We can consider these to be (1)
expert systems, (2) rating systems, (3) credit scoring models, and (4) market- based models.
In practice, credit analysts use a combination of methods to evaluate firms and to predict
their future creditworthiness.
Seven Things that Impact your Credit Score By Preeti Motiani, Dec 16, 2017, 12.38 PM IST
In an article published by the Economic Times, CIBIL suggests that a credit score lies between
300 to 900 and a score of around 750 makes it very easy to get loans approved. Some major
factors listed were paying the previous EMIs on time. Besides loans or EMIs only credit card
bills are considered while evaluating credit history and other household bills are not taken
into consideration. Second factor is maintain a healthy credit utilization ratio. It is calculated
in percentage terms. For instance, if your credit card limit is Rs 1 lakh and you have utilised
only Rs 40,000, then credit utilisation ratio will be 40%. Lower the credit utilisation ratio,
higher will be your credit worthiness. Third factor EMI-to-Income Ratio. It is calculated as your
monthly loan and credit card repayments divided by your income. The rule of thumb says,
maximum EMI-to-income ratio is 50%, as lenders assume that you will need half your salary
for living expenses.
Other factors are don't increase your credit card limit frequently, make sure all your old loans
are 'closed' and not 'settled' - contrary to settled, 'closed' status of a loan account suggest
that the loan has been fully paid off by the borrower and helps keep your credit score healthy.
Lastly not having a credit history may have a negative impact on your credit score.

Credit Risk factors identified through literature review:


 Gender
 Age
 Education
 Job
 Work experience
 Type of loan
 Amount of loan
 Individual loan frequency
 Individual account turnover average
 Time period of loan
 Type of collateral
 Interest rates
 Penalty rates
 History of customer relationship with the bank
 Received services
 Status of customer’s bank account
 Bank’s branch ranking
 Value of collateral
 Payment History
 Amounts Owed
 Length of Credit History
 New Credit
 Type of Credit in Use
 Worker efficiency in banking credits
 Central bank instructions
 Credit policy of the bank
 Management efficiency
 Leverage
 Risky sector loan exposure
 Regulatory capital
 Loan loss provision
 Funding cost
 Risk-weighted assets

Credit Risk factors from the data set provided:


 Checking account status
 Duration of credit in months
 Credit history
 Purpose of credit
 Credit amount
 Average balance in savings account
 Present employment since
 Instalment rate as % of disposable income
 Applicant is male and divorced
 Applicant is male and single
 Applicant is male and married or a widower
 Application has a co-applicant
 Applicant has a guarantor
 Present resident since - years
 Applicant owns real estate
 Applicant owns no property (or unknown)
 Age in years
 Applicant has other instalment plan credit
 Applicant rents
 Applicant owns residence
 Number of existing credits at this bank
 Nature of job
 Number of people for whom liable to provide maintenance
 Applicant has phone in his or her name
 Foreign worker
 Credit rating is good

Based upon the evaluation of variables found from the literature review and from the dataset
given it is evident that certain factors like demographics, credit history, source of current
income, nature of job type, value of collateral, applicants family status- single/ married,
reason of credit are some of the factors that hold a high weightage in predicting the overall
credit rating. Literature Review has highlighted the two main factors being in common i.e.
Duration and Amount of Loan in determining credit risk. Customers must work on these
factors so as to ensure that they are granted a loan in times when they need it the most.

Also, the importance of these factors change as per change in their geographical domain. For
example, factors having high significance for Malaysian banks may not be highly influencing
Indonesian or European. Thus we cannot neglect even one factor while we build a predictive
model as it may cause vast difference in the outcome and hinder our model accuracy.
3. Identify the outliers in the dataset using SPSS and remove the outliers from the
dataset using Azure. Describe the workflow needed to identify and remove the
outliers
In SPSS, by identifying the continuous variables, and passing in SPSS -> Descriptive -> Statistics ->
explores the following results could be found out. The results suggest the upper and lower limit for
identifying the outliers.

DURATION

70.00 Extremes (>=45)

Stem width: 10
Each leaf: 2 case(s)

& denotes fractional leaves.


AMOUNT

AMOUNT Stem-and-Leaf Plot

72.00 Extremes (>=7966)

Stem width: 1000


Each leaf: 2 case(s)

& denotes fractional leaves.


INSTALL_RATE
Stem width: 1
Each leaf: 5 case(s)
NUM_CREDITS

NUM_CREDITS Stem-and-Leaf Plot

6.00 Extremes (>=4.00)

Stem width: 0
Each leaf: 7 case(s)
AGE
AGE Stem-and-Leaf Plot

Frequency Stem & Leaf


23.00 Extremes (>=65)

Stem width: 10
Each leaf: 1 case(s)
To remove outliers:

Use clip values operation with upper and lower limit specified using the formula for 1 st and
3rd quartile. Duration has q1 = 12 and q3 = 24. Thus the values outside the range of lower
and upper limits will be clipped and replaced with threshold values. Variable Amount has
been clipped similarly.
4. Identify the missing values using Microsoft Azure and describe the work flow needed
to identify them and handle them. Remove the outliers from the dataset and perform
rest of the operations.
By using clip value function, initially we removed the outliers which were indicated as the missing
value in the column indicator. On which, by using the clear missing value -> delete entire row we
removed the outlier/missing entries from data and used it for subsequent working.
5. Populate the correlation matrix for the “numerical variables” and draw inferences.

Using the cleaned data, the 6 numerical variables are considered to identify the correlation among
them.

1. Duration
2. Amount
3. No. of credits
4. Age
5. Number of dependents
6. Instalment rate
The below screenshot indicates the procedure to be followed on Azure.
6. Review the predictor variables and guess from their definition as to what their role
might be in a credit decision.
After performing the initial analysis, by using filter based on feature selection, we found below 9
variables having role as predictor variables in the credit decision.

Predictor
Role in Credit Decision
Variables

Checking account denotes the ability of the debtor


CHK_ACCT
to pay off his/her credit.

Time in which debtors pay his/her credit. It is


DURATION
directly related to the credit risk.
History of previous payment made. More positive
HISTORY credit payment history represents debtor is in a
lesser risk of defaulting
It represents purchasing power and is directly
NEW_CAR
related to purchasers financial health.
It is indicator of financial standing position of the
USED_CAR purchaser as it requires less initial investment and
only maintenance charges are higher.
Furniture buy is generally done in setting up another
FURNITURE space, for example, home or office. Such a buy can
decide less hazardous credit.
Buying a radio or TV is a critical buy yet is generally
RADIO/TV less hazardous as the unexpected sum can be
recuperated from resale.
the more the educated debtor the more chances of
EDUCATION he/she getting an employment and less risk in
defaulting.
On the off chance that the motivation behind credit
is retraining, the arrival on interest as salary from
RETRAINING
occupation can be high. Subsequently the credit
hazard can be resolved.
If the credit amount is high the risk associated for
AMOUNT
the default also increases.
It is an indicator of the debtor's ability to pay back.
SAV_ACCT
Thus it can be used to determine credit risk.
If the debtor is employed, there is less risk of
EMPLOYMENT
defaulting on the payment.
If the instalment rate is high, the payback becomes
INSTALL_RATE
more difficult .
In the event that the candidate is separated, the
MALE_DIV
financial condition may be insecure.
If the applicant is single male, there are no
MALE_SINGLE
dependents or obligations.
If the applicant is married or a widower, there might
MALE_MAR_WID
be relative financial constraints

If there is a co-applicant present, payback risk and


CO-APPLICANT
obligation is shared.

guarantor helps to minimise the risk of defaulting if


GUARANTOR
there is a contingency.
If the debtor is a resident, the longer they have been
PRESENT_RESIDENT
staying here, less likely they are to default
If the debtor owns real estate, the credit is relatively
REAL_ESTATE risk free as the property owned can be used to
repay debt in case of default.
If there is no known owned property, the credit
PROP_UNKN_NONE
becomes higher risk.
AGE age may be directly related to financial stability.
If the debtor is on any other instalment plan,
OTHER_INSTALL
payment ability is predicted
If the applicant rents, the risk slightly higher than
RENT
own residence. As property for mortgage is absent.
this factor decides the repayment power if the
OWN_RES
debtor had their own resident
If the debtor already has credits at the bank then
NUM_CREDITS the likeliness of defaulting on all or some of them is
higher.
Type of job determines the type of salary and power
JOB
to repay.
Higher the no. of dependents, more is the difficulty
NUM_DEPENDENTS
in payback.
If the debtor owns a phone, then they can be
TELEPHONE contacted for inquiry and this helps in debt
recollection.
If the debtor is a foreign worker, risk of defaulting is
FOREIGN
less due to high salary received
7. Identify the categorical variables and communicate to Azure in all the cases
Following are the list of continuous and categorical variables. The right side of the window indicates
the list of categorical variables to be passed to azure.

CATEGORICAL VARIABLES:
CHK_ACCT
HISTORY
NEW_CAR
USED_CAR
FURNITURE
RADIO/TV
EDUCATION
RETRAINING
SAV_ACCT
EMPLOYMENT
MALE_DIV
MALE_SINGLE
MALE_MAR_WID
CO-APPLICANT
GUARANTOR
PRESENT_RESIDENT
REAL_ESTATE
PROP_UNKN_NONE
OTHER_INSTALL
RENT
OWN_RES
JOB
TELEPHONE
FOREIGN
RESPONSE/TARGET

The categorical variables in the German credit data set includes nominal variables with two
categories (binary) and more than two categories.
8. Divide the data randomly into training (60%) and testing (40%) partitions, and develop
classification models using the following data mining techniques: See which of the
models are predicting the outcome better and report. Provide the work flow.

8.1 Logistic regression

8.2 Boosted Decision tree

8.3 K-means Algorithm


Scored Data set
Scored data set to compare
10 decision trees have been created, sample 2 trees are shown below. All the root node and leaf
nodes can be seen clearly and visualised.
9. Identify the major predictor variables using “Filter based feature selection” in the
dataset and check if there is any improvement in the model variables and provide the
output for all the following methods (using 60% and 40%). Provide the work flow.

9.1 Logistic regression

9.2 Decision trees

9.3 K-means Algorithm


Scored Data Set
Scored Data Set to compare
10 decision trees have been created, sample 2 trees are shown below. All the root node and leaf
nodes can be seen clearly and visualised.
10. Interpret the results in business terms for all the three above scenarios in both the
scenarios in Q 8 & Q 9.
A

u
a
c
t

Predicted
l

Negative Positive
Negative 147 85
Positive 307 333

Accuracy 55%

The predictive modelling that allows us to determine the credit rating i.e. good credit or bad credit
have direct business implications.

 Banks and Non-banking financial institutions while loaning out money to people a credit rating
will help identify good creditors from bad creditors based on the above mentioned 30
variables, by doing so the rate of defaulting will reduce and the institutions lower their risk
and improve overall performance.

 From the perspective by understating the factors that contribute to the overall credit rating
one can improve on certain aspects to improve their credit rating so as to enhance their
chance of getting a loan in their time of need
When a bank receives a loan application, based on applicant’s profile the bank has to make a decision
whether to accept it or reject it. The risks associated with the decision are as follows:

1. If the Applicant is likely to repay the loan, then not approving the loan to person results in loss of
business to the bank - (False Positive).
2. If the Applicant is not likely to repay the loan, then approving the loan to the person results in
financial loss to the bank – (False Negative).

It may be concluded that the second risk is a greater risk as compared to first as lending money to
fraud party have a higher amount of effect than not giving the credit. This model would greatly help
to evaluate or verify the decision of credit response of the company.

According to this model, 270 loan applicants (Both accept and reject) have been identified correctly
whether they are eligible or not. 52 loan applicants are at the risk of non-paying the amount for which
loan is approved and bank should recheck their application. While the 27 loan applicants which have
been rejected by the bank have the potential to pay back the loan to the bank thus losing out on some
amount of profit. Hence, the bank might be at the risk of the financial loss in the future.

Also, the clustering exercise is done to identify similar observations or group the data into clusters
which have similar characteristics. The current model has divided the data into two distinct and
exhaustive clusters. By studying the characteristics of each cluster, the company can categorize their
perspective clients or loan applicants and pre-determine if they will have a good score or bad score.

11. References

1. Measuring Credit Risk of Bank Customers Using Artificial Neural Network by Mohsen Nazari
Department of Business Management, Faculty of Management, University of Tehran &
Mojtaba Alidadi (Corresponding author) Department of Business Management, Faculty of
Management, University of Tehran, January 2012
2. Factors Affecting Credit Risk: An Empirical Study of the Jordanian Commercial Banks by Dr.
Abedalfattah Zuhair Al-abedallat Faculty of Business and Finance, The World Islamic Sciences
& Education University, Amman, Jorda, December 2016
3. The 5 Biggest Factors That Affect Your Credit by By Amy Fontinelle, Jun 25, 2019
4. Key Factors Influencing Credit Risk Of Islamic Bank: A Malaysian Case by Nor Hayati Ahmad
And Shahrul Nizam Ahmad Faculty Of Banking And Finance University Utara Malaysia, January
2004
5. Determinants of Bank Credit Risk: Empirical Evidence from Jordanian Commercial Banks by
Buthiena Kharabsheh, Yarmouk University, 2013
6. Factors Affecting the Bank Credit: An Empirical Study on the Jordanian Commercial Banks by
Mwafag Rabab, April 25, 2015
7. Credit Risk Management: Development over the last 20 years – Edward I Altman, Anthony
Saunders, 1997
8. Analysis of Credit Risk Measurement Models in the Evaluation of Credit Demands - Mehmet
Ali Canbolat, 2015
9. Credit Risk Management – Ken Brown, Peter Moles, January 2016
10. Seven Things that Impact your Credit Score By Preeti Motiani, Dec 16, 2017

Attached Files:

Cleaned Data For Group 01_TIM.xls


Working.csv

S-ar putea să vă placă și