Sunteți pe pagina 1din 107

Agenda

•How Credit Decisions are Taken in the


Financial Industry?
•How is Machine Learning Being Adopted in
Taking the Decisions?

13-Feb-20 1
Why Credit Risk?
•Net Credit Loss : 25% of Revenues and Twice the net income in Q2 2011

13-Feb-20 2
Game One –Rank Them in Order
•Provided is the Financial Data on 10
Companies.
•You have to Rank Order them in terms of the
Companies Future Prospect / Credit Quality
•Time Allotted is 30 Min
•Presentation
•Discussion

13-Feb-20 3
Example -Looking for
Key Attributes

13-Feb-20 4
What Did We Learn ?
1.If we are a team of 4 –we have 5 Rank
Orderings!
2.We can develop own rating methodology
based on own inferences and Logic
3.Can be Super-Successful if we Blend the
Knowledge and Industry Wisdom

13-Feb-20 5
Agenda

13-Feb-20 6
PD Models -Statistical
Models in Credit Risk Measurement

13-Feb-20 7
How do you build a
Default Model for India
•Defining Default –Some considerations
–It Should Work
–Easy to Define
–Easy to Apply and Measure
–Lend to Future Improvement

13-Feb-20 8
Pages in Indian ‘Default’ History
•Sick Industrial Company Act (SICA, 1985)
–Registered for Five* Years
–Incurred Cash Loss for two consecutive years
–Networth is negative
•Eligible Company Mandatorily refer to BIFR
(Board for Industrial and Financial
Reconstruction)

13-Feb-20 9
Pages in Indian
‘Default’ History (Continued)
•Sick Industry Report (1992)*
-Evaluated Criteria Independently
-Recommended using 2 years of consecutive
Cash Flow as the Criteria
-Rationale –Early Intervention has chances of
High Survival of the Sick Companies

13-Feb-20 10
Does this Default Definition Work?
•Year 2007 : PAT of a company was -60.7 crores
•Year 2008 : PAT of a company was -97.4 crores
•Shall we define this company as a Defaulter / Sick
Company ?
•Net worth of the company was positive though in
these two years
•Name of this company : TATA Advanced Material
Ltd
•TATA Advanced Material Ltd sprung back to action
with positive profit in the later years
13-Feb-20 11
Proposed Definition
•Companies which has Negative Networth for the First timein
the Time window 1991-97
–Midway between the Two Definitions
–Is Measurable and can be Improved

13-Feb-20 12
PD Models –Linear Models
•Altman’s Z Score Model (1968)
•Z (Public) = 1.2X(1) + 1.4X(2) + 3.3X(3) + 0.6X(4) + 1.0X(5)
–Where Z > 2.99 is healthy
–Z < 1.81 is unhealthy
–1.81 < Z < 2.99 is indeterminate
•X(1) = Working Capital / Total Assets
•X(2) = Retained Earnings / Total Assets
•X(3) = Earnings Before Interest and Taxes (EBIT) / Total
Assets
•X(4) = Market Value of Equity / Total Assets
•X(5) = Net Sales / Total Assets
13-Feb-20 13
Steps to Build a Good Function
•Selection of the ‘Default’ Sample
•Creation of the Non-Default Sample
–Choice of Industry, Firm Size & Time Period
•Appropriate Treatment of the Data
–Choose the predictors carefully
•Choice of a Good Out of Sample Validation
Data –Stress Test the Data

13-Feb-20 14
Deep Dive on the Variables
•Clear Separation in mean values of variables between
defaulters and non-defaulters

13-Feb-20 15
Discriminant Function
•India Z Score Model (2000)
•Z (Public) = 1.06 + 0.01PBITINT –3.12TDTA + 0.48QR
+ 4.61NCATA
•PBITINT = Profits Before Interest & Taxes / Total
Interest
•TDTA = Total Borrowings / Total Assets
•QR = Current Assets –Inventories / Current Liabilities
& Provisions
•NCATA = Profit After Tax & Depreciation / Total
Assets
13-Feb-20 16
Does it Work ?

13-Feb-20 17
Challenges of Discriminant Model
•Challenges
–Zone of Indifference (Indeterminate)
–Sensitivity to Industry
–Dealing with New Companies
–Loss of Predictive Power across Time –use of
Penalty Functions
–Multi-colinearity of Variables
•Some variables provide the same information and
are highly correlated
–Assumption of Multivariate Normality of Variables
13-Feb-20 18
What if we use a
Logistic Framework?
•We can also Build a Logistic Model using 2010-11
Data
•Take all companies with negative Net worth in 2011
•Take the same number of companies with similar
asset size and same industry with positive Net worth
in 2011
•QC the data carefully, removing all outliers
•Compute all the predictors as of 2010
•Run the logistic regression and get the equation

13-Feb-20 19
Logistic Model -The new Equation
•Logit(score) = -6.9965
+ 4.8879 *Total Borrowings/Total Assets
-0.455 * Net working cap / Current Liabilities
-6.8605 *Net cash accruals / Total assets
+1.6317 *Current Liabilities / Current Assets
+ 0.1978 *Total Borrowings/Total Liabilities

13-Feb-20 20
Logistic Models –
Use in Credit Rating
•If we run the equation on N companies, we
can rank-order the companies based on the
score
•Create ratings based on the cutoffs chosen

13-Feb-20 21
How is Rating
Determined in CRAs ?

13-Feb-20 22
Developing CRA’s
Rating Algorithm
•Developing Ordered-LogitRegressions with Market
Information ( Incorporated market information into the logic)

13-Feb-20 23
Recap –What we Studied
•Discriminant Models though Accurate has ‘High
Maintenance’
•Logit/ ProbitRegressions display stability than
Discriminant Models
•The Strength of the Tool is a Function of the Quality
of Data used for the Analysis
•Credit Rating tries to resolve Information
Asymmetry in assigning the Right Price of Debt
•Either Tool can be used to develop an Independent
Rating Framework
13-Feb-20 24
Total Recall –Market Risk
•Develop Two Methods of VaR using
Parametric and Distribution Method.
•Single Stock
•Portfolio
•Developed Mean-Variance Portfolio
Approach of Weight Optimization

13-Feb-20 25
Industrial Wisdom
•If you want to do something New –Know the
Past –Spend 30% of the Allotted time.
•Spend 40% of Time getting the ‘Right’ Data
•Building the Analytical Solution is 10% of the
Time
•Stress Test the Solution for Remaining 20%
•Document the Key Learnings and
Opportunities for Future Enhancements

13-Feb-20 26
Total Recall –Credit Risk
•Challenge to Define Default in the Context of
India
•Choosing Logistic over Discriminant Models to
predict Default
•Need to Find ‘independent’ attributes / features
to predict Default
•Understanding and Cleaning of Data is Essential
•Dividing the Sample in Development, Validation,
Out of Sample Validation is a Must.
13-Feb-20 27
Total Recall –Credit Risk (Cont)
•Machine Learning to be used on the
same Sample to Develop Alternate type
of Models.
•Having a good understanding of the
Problem to be solved and the underlying
data is of Paramount Importance.

13-Feb-20 28
Game 2: Building a
Smart Default Model
•Define Default
•Do a data quality check and remove outliers
•Dividethe data into development and validation
datasets
•Identify and define variables that has high co-relation
with default
•DevelopLogistic function
•Develop Classification table
•Test performance on out-of-sample data
•Rank Order your 10 Companies –what do you Find?
13-Feb-20 29
Define Default
•Step 1 : Define Default in your dataset
–All the company with negative
Networthshould be defined as default
–If Networth<= 0 , then default = 1, else
default = 0 ( Add a new column in your dataset
called Default which takes a value of 0/1)
–What is the default rate in your dataset ?

13-Feb-20 30
Quality Check
•Step 2 –Quality Check of the data
–Have a look at the data and remove the outliers
•These outliers will distort your equation if not removed
–After you are satisfied with the data, randomly divide it
into development dataset (70%) and validation dataset(30%)
•Check the default rates of the development and validation
dataset
–since it is randomly split, the default rates should be similar
–You will now work on development dataset to develop the
model equation

13-Feb-20 31
Identification of Predictors
•Step 3 –Identify the predictors
–Look at the variables which give a good discrimination
between the defaulters and non-defaulters in the dataset
•Eg. If you want to use variable X in your model, you should
look at the mean value of variable X for non-defaulters , and
the mean value for defaulters. If there is a good
discrimination between the two mean values, then the
variable X should be used in your model
–Ratio variables MIGHT be more appropriate for the model
equation!

13-Feb-20 32
Workshop Recap
•Defined Default
•Divided the data into development and
validation datasets
•Did a data quality check and remove outliers
•Identified and defined variables that had
high co-relation with default

13-Feb-20 33
Develop Logistic Equation
•Step 4 : Develop Logistic Equation
–Transfer the data to the SPSS sheet
–Look for the Logistic Regression field in the SPSS sheet ( will
come under the Analyze tab)
–Dependent variable is the 0/1 Default Column
–Independent variables are the predictors that you have chosen
–Remove the variables that have high significance level
–Fine tune your final model by looking at the sign of the
predictors
•The predictors sign should make intuitive sense. Eg. A variable
like CL/CA should have positive sign
–Look at the efficiency rate of the model prediction
13-Feb-20 34
Out of Sample Validation
•Step 5 : Your model is now ready
–Take the validation dataset
–Score each row with your equation –get the score for
everyone
–Count the number of defaulters in your dataset
•Lets assume that there are 30 defaulters in your dataset
•Based on your score, take the top 30 companies
•Look at the default rate of these top 30 companies –This
gives you the efficiency of your model
•Higher the efficiency, better is the model

13-Feb-20 35
First Set of Lessons
1.Understanding Data in the First Step to
Successful Analytical Exercise
2.One Can Build a Great Model for Credit
Rating using Any Technique provided you
know its Limitations
3.Can be Super-Successful if we Blend the
Knowledge and Industry Wisdom

13-Feb-20 36
Machine Learning in Credit Risk

13-Feb-20 37
Learning from 2
MM Models -Kaggle
Great Lakes
•Data Type
–Structured Data
–Unstructured Data
•Step 1: Understand the Data Generation
Process
–Explore the Data

13-Feb-20 38
Learning from 2
MM Models -Kaggle
•Step 2: Feature Engineering
–Structured Data
–Rank Plot / Hypothesis Testing
–Synthetic Variables
•Feature Engineering
–Not Relevant for Un-Structured Data

13-Feb-20 39
Learning from 2
MM Models -Kaggle
•Step 3: Structured Data -Fitting the Right
Algorithm
–Random Forest
–Support Vector Machine
–Gradient Boosting Machine
•Unstructured Data -Deep Learning
–CNN or RNN (image vs sequence data)

13-Feb-20 40
Learning from 2
MM Models -Kaggle
•Caution
–Overfitting
–Use Cross Validation to Test Model
Performance
–Poor Performance in Out of time Sample
•Participate in Kaggle
–Get the Real Experience
–Cloud Based Kernel of Kaggle
13-Feb-20 41
LOGISTIC REGRESSION

13-Feb-20 42
Logistic Regression
Logistic Regression builds a non-linear equation to predict a dichotomous
variable. In fact, what it does is classification rather than regression unlike
its name!

13-Feb-20 43
Why not Linear?
•The Y variable is a binary
variable –1 or 0
•The relationship between the
dependent and independent
variables is non-linear
•The usual linear regression
generates values outside [0,1]
•A linear fit to a binary variable
becomes very sensitive to
extreme values
•Other statistical Complications!

13-Feb-20 44
Logistic function –a better fit!
So, we need a function that stays within the bounds of 0 & 1
and represents the data in a much better manner

13-Feb-20 45
How does logistic learn
the Coefficients -An Example
Can you predict whether a person will buy a house
with the given information?

13-Feb-20 46
How does logistic learn
the Coefficients -An Example

13-Feb-20 47
How does logistic learn
the Coefficients -An Example

Cost function:
When Y=1, then Cost = -log (Prediction)
When Y=0, then Cost = -log (1-Prediction)

13-Feb-20 48
How does logistic learn
the Coefficients -An Example
Step 3: Adjust the coefficients and the
predictions in an iterative fashion to move
towards the global Cost minima

13-Feb-20 49
RANDOM FOREST

13-Feb-20 50
Why is it called a Forest?
• Predictive model based on a branching series of Boolean tests
• Boolean tests are less complex than one-stage classifiers

“Forest” or an
ensemble of trees
required to
address over-
fitting

13-Feb-20 51
But why Random?
• Bootstrap aggregating(or bagging)
• Random feature selection
Split variable at every node of a
tree is randomly selected from the
full list of features
Bagging is the process through
which samples are selected to
build the trees; random samples
are repetitively drawn (with or
without replacement) from the
training set

13-Feb-20 52
Let’s take an example

13-Feb-20 53
Building Decision Trees
Subsequent trees are generated in a similar manner on different samples

13-Feb-20 54
To summarize the model
building process…

13-Feb-20 55
How best to use Random Forest?
Parameter Tuning

Impact
Larger number of trees = Less chance of over-fitting
More complex solution
Higher runtime

13-Feb-20 56
How best to use Random Forest?
Parameter Tuning

Impact
More randomly selected variables = Significant variables show up
Repetitive trees –all variables in
data not evaluated

13-Feb-20 57
How best to use Random Forest?
Parameter Tuning

Impact
Higher sampling ratio = Enough data points to build trees
Not enough data points to test the
stability of trees

13-Feb-20 58
How best to use Random Forest?
Parameter Tuning

Impact
Sampling without replacement = Trees covering different dimensions
Limit on maximum number of trees

13-Feb-20 59
GBM

13-Feb-20 60
Gradient Boosting !
Gradient boosting produces a strong prediction model by
ensembling many weak prediction models, typically decision
trees, built in a stage-wise fashion

13-Feb-20 61
RF Vs GBM

13-Feb-20 62
GBM Introduction
•GBM builds decision trees in a stage-wise manner
•The first set of prediction is initialized to a constant value and the first tree is build
on the residual error from this constant value
•Successive trees use the residual from the previous tree to reduce prediction
error
•GBM score is a linear combination of the individual tree predictions

13-Feb-20 63
Lets Look At An Example
Can you predict the value of the home of any person with the given
information?

13-Feb-20 64
Lets Look At An Example

13-Feb-20 65
Lets Look At An Example

Learning Rate : 10%

13-Feb-20 66
Lets Look At An Example
Step 6,7,8.. : Repeat process with next trees until errors are minimized!

The number of trees you build, the depth of each tree and the learning rate will all
decide how good a model you make!
13-Feb-20 67
Overview of GBM –
Regression and Classification

13-Feb-20 68
How best to use GBM?

13-Feb-20 69
Keep in Mind

13-Feb-20 70
k-NN algorithm

13-Feb-20 71
Finding Lookalikes
How do you find people similarto SushilKumar among a group
of sportspersons?

13-Feb-20 72
Concept of Distance and Similarity

13-Feb-20 73
k-NN
(k-Nearest Neighbors) Algorithm
Algorithm tofind the k most similar people, i.e., the Knearest neighbors

13-Feb-20 74
Example using kNN
Can you predict whether Maitreehas a car?

13-Feb-20 75
k-NN Algorithm:
Mathematical Formulation

13-Feb-20 76
k-NN Algorithm:
Mathematical Formulation

13-Feb-20 77
k-NN Algorithm:
Mathematical Formulation

13-Feb-20 78
k-NN Algorithm:
Mathematical Formulation

13-Feb-20 79
Parameters for kNNmodels
Distance Metric Dimensions
Should satisfy triangle inequality. Should be independent and identically
For example: distributed (IID).
•Euclidean distance For example:
•Chebysev’sdistance •Age
•Manhattan distance •Income
•Mahanalobisdistance

Value of k Scoring Function


Typically selected through cross- Label of the nearest neighbors weighed
validation. differently:
•Distance to the point
•Rank of the neighbor

13-Feb-20 80
Steps to Build a ML Model -1
•Step 1: Divide the Data in Development and
Validation
•Step 2: Appropriately Floor and Cap the
Variables
•Step 3: Execute the Different Models
•Step 4: Save the Results

13-Feb-20 81
Steps to Build a ML Model -2
•Step 5: Score the Validation Data Sets
•Step 6: Keep the Relevant Variables
•Step 7: Compute the GINI of the Out of
Sample
•Step 8: Save the Data Sets with Predicted
Variables and MERGE Key
•Step 9: Compare the Results

13-Feb-20 82
Step 1a –Read the Data
•Read the Data in R
data<-read.csv("dev-data-1.csv")
•Check the observations
nrow(dataset name)

13-Feb-20 83
Step 1b –Split the Data
in Development and Validation
•Do a 75-25 Split of the Data –Training (development) and Test
(Validation)
splitdf<-function(dataframe, seed=NULL) {
if (!is.null(seed)) set.seed(seed)
index <-1:nrow(dataframe)
trainindex<-sample(index, trunc(length(index)*0.75))
trainset <-dataframe[trainindex, ]
testset<-dataframe[-trainindex, ]
list(trainset=trainset,testset=testset)
}
splits <-splitdf( data, seed=nrow(data))
training_lg<-splits$trainset
testing_lg<-splits$testset
13-Feb-20 84
Step 2 –Floor and Cap
the Variables
•Flooring of the Missing Has been Done as “1”
training_lg$TA[is.na(training_lg$TA)]<-1
training_lg$TI[is.na(training_lg$TI)]<-1
training_lg$TE[is.na(training_lg$TE)]<-1
training_lg$PAT[is.na(training_lg$PAT)]<-1
•Alternate Flooring Capping can also be
used…..

13-Feb-20 85
Step 2a –Alternate
Capping and Flooring of Variables
•Replacing Missing with Means
training$TLIAB[is.na(training$TLIAB)] <-
round(mean(training$TLIAB,na.rm=TRUE))

1stPercentile
•Replacing with Percentile Values:
training$INVST1<-ifelse(training$INVST<= 10,10,training$INVST)

training$INVST1<-ifelse(training$INVST1>= 1222.485,
1222.485,training$INVST1)
99thPercentile

13-Feb-20 86
Step 3 & 4 –Execute the
Different Models -Logistic
•Need to Load the GLM library –happens
automatically
library(glm2)
•Run the Relevant Equation
model<-glm( DEF ~
TA+TI+TE+PAT,data=training_lg,family="binomial")
summary(model)
•Play Around till you get the ‘Best’ Equation
Tips: P1 <-fitted(model)

13-Feb-20 87
Step 5 –Validate the
Model in the Validation Data
•Use the Model Equation to Come with the Predicted Scores
on Validation Sample
predicted<-predict(model, newdata=testing_lg,
type="response")
•Transform the ‘predicted’ temp variable to a Variable in the
File
d <-transform(predicted)
•Save the Temp d variable to predlgvariable in the
Testing_lgFile
testing_lg$predlg<-d$X_data
•Check the results –head(testing_lg)

13-Feb-20 88
Step 6: Keep the
Relevant Variables
•Only Keep the Relevant Variables –3 of them

testing1_lg <-subset(testing_lg, select = c(predlg,DEF,Num))

Predicted prob Default Indicator Merge key

13-Feb-20 89
Step 7 –Compute the GINI
•Load new Library -library(Hmisc)
•Relevant Commands:
rcorr.cens(oot_lg$pred,oot_lg$DEF)
rcorr.cens(testing_lg$pred,testing_lg$DEF)
•Dxy= GINI
•C Index = concordence

13-Feb-20 90
Step 8 –Save the Results in a File
•Save the Results in a CSV File
write.csv(oot1_lg,"oot_logit_pred.csv")
•Check the Data in EXCEL

13-Feb-20 91
Step 3 & 4: Algorithm –
Random Forest
•Load the Following Library: Random Forest
Library (randomForest)
•Run the Relevant Command
training$DEF<-factor(training$DEF)
library(randomForest)
set.seed(71)
RF<-randomForest(DEF ~ TA + TI + PAT + PBDITA + PBT + CPFT
+ PBTI + PATI + Sales + QR + CR + DE -Num, data= training ,
ntree= 50, mtry= 3, importance = TRUE, na.action= na.omit,
keep.forest= TRUE, do.trace= 10)

13-Feb-20 92
Step 3 & 4: Random
Forest –Model Diagnostics
•Result Summary –summary(RF)
•Variable Importance –importance (RF)
•Another Way of Variable Importance –
round(importance (RF), 2)
•Classification Metrics –print(RF)
•Printing one of the trees –getTree(RF,1) –
1stTree

13-Feb-20 93
Step 5 & Rest: Score
Validation Data -RF
•Scoring the Validation Dataset
predicted <-predict(RF, testing, type = "prob")
•Only Keep the ProbCorresponding to the
Second Column
prob_rf<-predicted[,2]
•Save it as variables
g <-transform(prob_rf)
testing$pred_rf<-g$X_data
13-Feb-20 94
Step 5 & Rest:
Random Forest
•Scoring the Validation Dataset
predicted <-predict(RF, testing, type = "prob")
•Only Keep the ProbCorresponding to the Second
Column
prob_rf<-predicted[,2]
•Save it as variables
g <-transform(prob_rf)
testing$pred_rf<-g$X_data
head(testing)

13-Feb-20 95
Step 5 & Rest:
Score Validation Data -RF
•Subset the Data for the Relevant Variables
testing1_rf <-subset(testing, select =
c(pred_rf,DEF,Num))
•Compute the GINI
library(Hmisc)
rcorr.cens(testing1_rf$pred,testing1_rf$DEF)

13-Feb-20 96
Step 5 & Rest:
Score Validation Data -RF
•Merge the Relevant Files
Scored_testing1 <-
merge(testing1_lg,testing1_rf ,by="Num")
•Save this as a Permanent Data
write.csv(Scored_testing1,"testing_lg_rf_pred.
csv")
•Repeat this Action as you Append Other
Probabilities using Different Algorithms

13-Feb-20 97
Step 3 & 4: Algorithm –
Gradient Boosting Machine
•Load the Following Library: GBM
Library (gbm)
•Run the Relevant Command
library(gbm)
gbm_model<-gbm(DEF ~
TA+TI+TE+PAT+PBDITA+CPFT+PBDITAI+Sales+SHF
+NWC+QR+CR+DE+EPS+TLIAB, training,
distribution = "bernoulli", n.trees= 5,shrinkage=
0.1, interaction.depth= 3)
13-Feb-20 98
Step 3 & 4: Random
Forest –Model Diagnostics
•Variable Importance
summary(gbm_model,
cBars=length(gbm_model$var.names),
n.trees=gbm_model$n.trees,
plotit=TRUE,
order=TRUE,
method=relative.influence,
normalize=TRUE)
13-Feb-20 99
Step 5 & Rest:
Score Validation Data -GBM
•Scoring the Validation Dataset
predict_gbm<-predict(gbm_model, testing,
n.trees=5, type="response")
summary(predict_gbm)
•Save it as variables
b <-transform(predict_gbm)
testing$predgbm<-b$X_data
head(testing)
13-Feb-20 100
Step 5 & Rest:
Score Validation Data -GBM
•Subset the Data for the Relevant Variables
testing1_gbm <-subset(testing, select =
c(predgbm,DEF,Num))
•Compute the GINI
library(Hmisc)
rcorr.cens(testing1_gbm$predgbm,testing1_g
bm$DEF)

13-Feb-20 101
Step 5 & Rest:
Score Validation Data -GBM
•Merge the Relevant Files
Scored_testing1 <-merge(testing1_lg,testing1_rf
,testing1_gbm by="Num")
•Save this as a Permanent Data
write.csv(Scored_testing1,"testing_lg_rf_gbm_pr
ed.csv")
•Repeat this Action as you Append Other
Probabilities using Different Algorithms

13-Feb-20 102
Step 2: Algorithm –KNN
•Load the Following Library: KNN
library(kknn)
•Standardize the Data
training$nTA=(training$TA-
min(training$TA))/(max(training$TA)-
min(training$TA))
training$nTI=(training$TI-
min(training$TI))/(max(training$TI)-
min(training$TI))
•Do the Same for Validation Data as well
13-Feb-20 103
Step 3 & 4: Algorithm –KNN
•Run the Relevant Algorithm
library(kknn)
knn< -kknn(as.factor(DEF)~ nTA+ nTI+ nPAT+
nPBDITA+ nPBT+ nCPFT+ nPBTI+ nPATI+ nSales+
nQR+ nCR+ nDE,training, testing, k = 7, distance =
2)
K = number of points in the Neighbourhood
Distance = 2 (Minowski’sDistance =
(|distance|)**(2)
Use Both the Training and Test in the same code
13-Feb-20 104
Step 5 : Model Results &
Diagnostics: Algorithm –KNN
•Check the Results
summary(knn)
fit<-fitted(knn)
plot(fit)
•Save the Results as a Variable
b <-transform(fit)
testing$pred_knn<-b$X_data
table (testing$DEF,testing$pred_knn)
13-Feb-20 105
Step 6 and Beyond:
Algorithm –KNN
•Save Relevant Variables and Merge with
Original Data
testing1_knn <-subset(testing, select =
c(pred_knn,DEF,Num))
•Create the Final Data Set
Scored_testing1 <-
merge(testing1_lg,testing1_gbm,testing1_rf,
testing1_knn ,by="Num")
•Repeat this for other Algorithms
13-Feb-20 106
References
•Altman (1968), Financial Ratios, Discriminant Analysis and the
Prediction of Corporate Bankruptcy, Journal of Finance, 23, No.4,
589-609
•Altman (1993), Corporate Financial Distress and Bankruptcy: A
Complete Guide to Predicting & Avoiding Distress and Profiting from
Bankruptcy, John Wiley Second Edition
•AnantT C, GangopadhyayS and GoswamiO (1992), Industrial
Sickness in India: Characteristics, Determinants and History, 1970-90,
Report 2, Government of India, Ministry of Indian Office of
Economic Advisors
•RaghunathanV and J Verma(1992), CrisilRatings: When does AAA
mean B? Vikalpa, Vol17, No 2, 35-42
•Emerging Market Score Model:
http://pages.stern.nyu.edu/~ealtman/emerging_markets_review.pdf
13-Feb-20 107

S-ar putea să vă placă și