Sunteți pe pagina 1din 3

Introduction: Loan is a debt evidenced by a note which specifies, among other things, the principal amount, interest rate,

and date of repayment. Loan entails reallocation of subject assets for a period of time between borrower and the lender. Loan is significantly dependent on rate on of interest and installments to pay back loan (Loan length), amount requested and amount funded. Based on the above key factors only the analysis can be carried out. Using exploratory analysis and standard multiple regression techniques we show that there is a significant relationship between interest amount and loan length, even after adjusting for important confounders such as the amount requested and amount funded at which the Loan transaction occur.

Methods: Data Collection: This analysis used a sample of 2,500 loans from the Lending Club, as provided by instructor Jeff Leek for the Data Analysis class on Coursera.org [2]. These datawere downloaded from the course website given below links on February 16, 2013 using the R programming language [3]. https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv https://spark-public.s3.amazonaws.com/dataanalysis/loansData.rda Exploratory Analysis: Exploratory analysis was performed by examining tables and plots of the observed data. We identified transformations to perform on the raw data on the basis of plots and knowledge of the scale of measured variables. Exploratory analysis was used to identify missing values, verify the quality of the data, and determine the terms used in the regression model relating interest rate and loan length. Statistical Modelling: To relate interest rate to loan length we performed a standard multivariate linear regression model [4]. Model selection was performed on the basis of our exploratory analysis and prior knowledge of the relationship between amount requested and amount funded. Coefficients were estimated with ordinary least squares and standard errors were calculated using standard asymptotic approximations. Reproducibility: All analyses performed in this manuscript are reproduced in the R markdown file loansdata.rda.To reproduce the exact results presented in this manuscript the cached version of the analysis must be performed, as the data available from loansdata.csv changes based on the date.

Results: The Loans data used in this analysis contains information on the source network that measured the amount requested, amount funded, Interest Rate, Loan Purpose and Debt to income ratio, Home owner ship and Fico range and Loan length. We identified no missing values in the data set we collected and all measured variables were observed to be inside the standard ranges. Loans in this data set also did not seem to show major patterns over time in Interest rate or Fico rate. The exploratory analysis identified some missing or erroneous values, but these observations were retained for the linear model as the missing values did not appear in the factors under consideration. We also identified some outliers, but these were also retained as there was effectively no difference in the coefficients of the linear regression model whether they were included or not. Early analyses suggested a relationship between applicant FICO scores andthe interest rates of the loan (Figure 1). As suggested by Figure 1, lowerapplicant FICO scores emerge as an indicator of higher interest rates. However, given the parameters of the analysis [2], we also know that interest rates are our outcome measure and that we are holding applicant FICO scores as a constant. To inspect the factors identified in our SVD, we performed additional plots toanalyze the relationships between those factors, our outcome (interest rate),and our constant (FICO score). We replotted the FICO score and interest rate data, coloring the points by amount requested (Figure 2), amount funded (Figure 3), and by the loan length (Figure 4). Although some stratification wassuggested in the plots for amount requested and amount funded (Figures 2 and 3, respectively), a much stronger correlation was implied with loan length (Figure4). Delving deeper, we fit a regression model that looked at the interest rate (_IR_) as the outcome and examined the amount requested, amount funded, and the loan length. Our final regression model was: $$ IR = b_0 + b_1 (AR) + f(AF) + g(LL) + e $$ where _b<sub>0</sub>_ is an intercept term and _b<sub>1</sub>_ represents the change in interest rate associated with the identified factors: loan amount requested (_AR_), the actual funded by investors (_AF_), and the length of the loan (_LL_). The regression model includes an error term (_e_) to represent all of the unmeasured and unmodeled sources of variance in the interest rate. We observed a highly statistically significant relationship between interest rate and loan length (_P_ < 0.001), as well as a statistically significant relationship between interest rate and amount funded (_P_ = 0.003); no significant relationship was found between interest rate and amount requested. Focusing on the most statistically significant relationship, we noted that a change in the loan length corresponded to a change of _b<sub>1</sub>_ = 0.14% in the interest rate (95% Confidence Interval: 0.13, 0.16); a change of one unit (_i.e._, $1000) in the amount funded corresponded to a change of _b<sub>1</sub> = 1.17% in the interest rate (95% Confidence Interval: <0.0001, 0.0002). Conclusions: These analyses suggest that given the same applicant FICO score, a difference ininterest rate between two loans can most likely be explained by the length ofthe loans and the amount funded by investors. Of the two significant factors,the loan length appears to have the greatest effect, and the stratification that comes from this effect is rather strongly illustrated in Figure 4.Despite the strong effect indicated by the linear regression model, thereremains a possibility that other factors may strongly influence the interest

rate of a given applicant's loan. Though the singular value decompositiontechnique was used to identify the candidate factors, it is clearly an imperfecttool. Recall that the SVD pointed to the amount requested as accounting for themost variance in the data, and that this implied that the amount requested would also have the strongest effect in the final model. However, the amount requestedultimately did not have a statistically significant effect on the interest rate.This outcome leads us to suspect that other variables from the data, which wereotherwise eliminated by the SVD, may affect the interest rate in important ways. Unfortunately, we did not analyze these other factors with that level ofdepth and cannot comment on them except to say that future analyses should probethem in more detail. Also, it is important to mention that the Loan Length factor only included two levels (_i.e._, "36 months" and "60 months"). This may have contributed to the strength of the factor's effect on the interest rate. It is unclear from the Lending Club's website [1] whether loans of any other length are available. Future analyses should incorporate loans with other lengths (_e.g._, 12 months,24 months, etc.) and/or loans from other sources to see if the effects and patterns indicated in the linear regression model persist. References: ----------1. The Lending Club home page. URL: <https://www.lendingclub.com/home.action>. Accessed 2/16/2013. 2. Coursera.org: Data Analysis Assignment #1. URL: <https://class.coursera.org/dataanalysis001/human_grading/view/courses/294/assessments/4/submissions>. Accessed 2/16/2013. 3. R Core Team (2013). "The R Project for Statistical Computing." URL: <http://www.r-project.org>. Accessed 2/16/2013. 4. Baker, Kirk. "Singular Value Decomposition Tutorial". URL: <http://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf>. Accessed 2/16/2013. 5. Howell, David C. *Fundamental Statistics for the Behavioral Sciences*. Wadsworth Cengage Learning, 2011.

S-ar putea să vă placă și