Sunteți pe pagina 1din 6

R Programing

6 May 2018

Class Notes:

Rules of Accessing Data in R

1. DATAFRAME : SQUARE BRACKETS [ ]


a. Df[ ROW, COL ] : $ used for one col at a time
b. daframename$
2. FUNCTIONS : ( )
3. LIST : [[ ]]
4. Vector is [ ]

Home Work
1. Can I get the location of Variable where it is stored?
a. Function Similar to id() in Python
2. What is && and other bitwise operators in R?
3. Read the R File which I have give.

Interview Questions
1. paste0()
2. %in%
3. Vector of smaller and larger length manipulation
4. Dataframe of larger and smaller number of rows

NOTE : Next class we will have test if you can’t code then you have to teach. Based on the R
file only.

12 May 2018

Home Work:

1. Linear Regression
a. http://www.learnbymarketing.com/tutorials/linear-regression-by-hand-in-excel/
2. What is the hypothesis of Linear regression?
3. Read about BETA0 and BETA1 values in the form of “r” ( Coeff of Cor)
4. Can we understand following Degrees of Freedom?

Regression Analysis

1. MAPE
2. RMSE
3. Log RMSE
4. MSE
5. MAE

Interview Questions

1. P value of Linear Regression : How to use it?


2. R square
3. Adjusted R Square
4. Coefficients Effect
a. If you increase X value by 1 then BETA is effect of it in the Y
5. How to test ASSUMPTIONS of Linear Regression
a. Linearity : Y is linear combination of x
i. Don’t say : X is linear
ii. Plot of ERROR AND ACTUAL
1. Histogram
2. plot( model name , which = 1)
3.
b. Normality of Error : Errors are normally distributed
i. Plot the errors
ii. Testing with METHOD
1. ????
c. Homoscedasticity
i. Test for This????
d. Errors are independent of Each other
e. Multicollinearity
i. Check for VIF Values : Variation Inflation Factor
ii. 1 / (1- R Square)
iii. Thumb Rule : VIF Greater than 10 , remove that col
6. How we get VIF for Every Column when formula is with R Square?

Home Work :

1. Understanding the Beta values for the simple linear regression


a. How to get the Beta values without LM Function
2. Understanding the output of LM In details
a. http://www.learnbymarketing.com/tutorials/explaining-the-lm-summary-in-r/
3. Durbin Watson
4. Test for Homoscedasticity

Linear Regression:

Interview Questions

1. When to use linear regression?


2. How many dummy variables are created if there are n categories?
3. What are the assumptions of Linear Regression?
4. What is multicollinearity and how to remove it?
5. Package used for VIF?
6. How to check if the Linearity assumption holds true?
7. How to check all the assumption of Linear Regression?
8. How we get multiple VIF values when we have only one model of LR?

19th May 2018

Logistic Regression

Homework: ( Logistic Regression)

1. Lift and Gain Chart


2. F1 Score , Gini , Lorenz Curve
3. https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-
error-metrics/

Interview Questions:
1. What are cutoffs in Logistic
a. P = .5 : Balanced Data
b. KS Cutoff
2. Different Cutoff
a. ROC AUC Curve
b. Business
3. Model Measuring parameters in Logistics
a. AUC
b. ROC
c. Gini
d. F1 Score
e. Confusion Matrix
f. Accuracy
g. Recall / Precision
h. Concordance Ratio
i. Hosmer Lemeshow
j. Mac Faddens R Squre
4. How to handle imbalanced data in logistic regression
a. Analytics Vidya
b. https://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-imbalanced-
classification-problems/
c. https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-
imbalanced-data-can-add-value-to-your-resume/
5. How ROC curve is created?

Trees :

Homework: ( Logistic Regression)

1. Lift and Gain Chart


2. F1 Score , Gini
3. https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-
error-metrics/

Date : 26 / 27 May

Homework
1. Reduction in Variance in tree?
2. Deviance in tree?

Interview Questions
1. How the Trees are different in Random forest
2. What is difference between BAGGING and Random Forest
3. What is OOB in Random forest?
4. What the the methods on which trees are build?
a. Gini, Chi Square , Entropy
5. How nodes are divided in trees?
6. When to use Linear Regression over trees?
7. How to use Random forest input to LR
a. Use the important variables in RF to LR
b. We get Important variable plot in R
c. Variable Importance Plot
i. Mean Decrease Gini
8. Use of Trees over Linear regression?
9. How do you prune a tree?

Random Forest Tuning :

https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/tutorial-
random-forest-parameter-tuning-r/tutorial/

Homework : Apply on Test data

2nd June : Naive Bayes

Homework

1. How to predict on Naive Bayes


2. Repeat function in our code

How Clustering Works

https://www.youtube.com/watch?v=XJ3194AmH40&t=181s

Clustering:

Hierarchical Clustering

1. Type
2. Merge Method
a. Complete ,Single , Average , Centroid , WARD
3. Dendrogram : How this is created?
4. Why this is not good compared to KMeans?

Kmeans

1. How to find optimal K


a. Calenski
b. Elbow
c. GAP
2. Can K Means work with Categorical Data?
3. How K means work?
4. K means is sensitive to INITIAL SEEDS
a. Run Multiple times and then give lease WSS

How to validate Clusters


Calculate Distance

S-ar putea să vă placă și