Home Work / Interview Questions

R Programing
6 May 2018
Class Notes:
Rules of Accessing Data in R
1. DATAFRAME : SQUARE BRACKETS [ ]

a. Df[ ROW, COL ] : $ used for one col at a time
b. daframename$
2. FUNCTIONS : ( )
3. LIST : [[ ]]
4. Vector is [ ]
Home Work
1. Can I get the location of Variable where it is stored?
a. Function Similar to id() in Python
2. What is && and other bitwise operators in R?
3. Read the R File which I have give.
Interview Questions
1. paste0()
2. %in%
3. Vector of smaller and larger length manipulation
4. Dataframe of larger and smaller number of rows
NOTE : Next class we will have test if you can’t code then you have to teach. Based on the R
file only.
12 May 2018
Home Work:
1. Linear Regression
a. http://www.learnbymarketing.com/tutorials/linear-regression-by-hand-in-excel/
2. What is the hypothesis of Linear regression?
3. Read about BETA0 and BETA1 values in the form of “r” ( Coeff of Cor)
4. Can we understand following Degrees of Freedom?
Regression Analysis
1. MAPE
2. RMSE
3. Log RMSE
4. MSE
5. MAE
Interview Questions
1. P value of Linear Regression : How to use it?

2. R square
3. Adjusted R Square
4. Coefficients Effect
a. If you increase X value by 1 then BETA is effect of it in the Y
5. How to test ASSUMPTIONS of Linear Regression
a. Linearity : Y is linear combination of x
i. Don’t say : X is linear
ii. Plot of ERROR AND ACTUAL
1. Histogram
2. plot( model name , which = 1)
3.
b. Normality of Error : Errors are normally distributed
i. Plot the errors
ii. Testing with METHOD
1. ????
c. Homoscedasticity
i. Test for This????
d. Errors are independent of Each other
e. Multicollinearity
i. Check for VIF Values : Variation Inflation Factor
ii. 1 / (1- R Square)
iii. Thumb Rule : VIF Greater than 10 , remove that col
6. How we get VIF for Every Column when formula is with R Square?
Home Work :
1. Understanding the Beta values for the simple linear regression

a. How to get the Beta values without LM Function
2. Understanding the output of LM In details
a. http://www.learnbymarketing.com/tutorials/explaining-the-lm-summary-in-r/
3. Durbin Watson
4. Test for Homoscedasticity
Linear Regression:
Interview Questions
1. When to use linear regression?

2. How many dummy variables are created if there are n categories?
3. What are the assumptions of Linear Regression?
4. What is multicollinearity and how to remove it?
5. Package used for VIF?
6. How to check if the Linearity assumption holds true?
7. How to check all the assumption of Linear Regression?
8. How we get multiple VIF values when we have only one model of LR?
19th May 2018
Logistic Regression
Homework: ( Logistic Regression)
1. Lift and Gain Chart

2. F1 Score , Gini , Lorenz Curve
3. https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-
error-metrics/
Interview Questions:
1. What are cutoffs in Logistic
a. P = .5 : Balanced Data
b. KS Cutoff
2. Different Cutoff
a. ROC AUC Curve
b. Business
3. Model Measuring parameters in Logistics
a. AUC
b. ROC
c. Gini
d. F1 Score
e. Confusion Matrix
f. Accuracy
g. Recall / Precision
h. Concordance Ratio
i. Hosmer Lemeshow
j. Mac Faddens R Squre
4. How to handle imbalanced data in logistic regression
a. Analytics Vidya
b. https://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-imbalanced-
classification-problems/
c. https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-
imbalanced-data-can-add-value-to-your-resume/
5. How ROC curve is created?
Trees :
Homework: ( Logistic Regression)
1. Lift and Gain Chart

2. F1 Score , Gini
3. https://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-
error-metrics/
Date : 26 / 27 May
Homework
1. Reduction in Variance in tree?
2. Deviance in tree?
Interview Questions
1. How the Trees are different in Random forest
2. What is difference between BAGGING and Random Forest
3. What is OOB in Random forest?
4. What the the methods on which trees are build?
a. Gini, Chi Square , Entropy
5. How nodes are divided in trees?
6. When to use Linear Regression over trees?
7. How to use Random forest input to LR
a. Use the important variables in RF to LR
b. We get Important variable plot in R
c. Variable Importance Plot
i. Mean Decrease Gini
8. Use of Trees over Linear regression?
9. How do you prune a tree?
Random Forest Tuning :
https://www.hackerearth.com/practice/machine-learning/machine-learning-algorithms/tutorial-
random-forest-parameter-tuning-r/tutorial/
Homework : Apply on Test data
2nd June : Naive Bayes
Homework
1. How to predict on Naive Bayes

2. Repeat function in our code
How Clustering Works
https://www.youtube.com/watch?v=XJ3194AmH40&t=181s
Clustering:
Hierarchical Clustering
1. Type
2. Merge Method
a. Complete ,Single , Average , Centroid , WARD
3. Dendrogram : How this is created?
4. Why this is not good compared to KMeans?
Kmeans
1. How to find optimal K

a. Calenski
b. Elbow
c. GAP
2. Can K Means work with Categorical Data?
3. How K means work?
4. K means is sensitive to INITIAL SEEDS
a. Run Multiple times and then give lease WSS
How to validate Clusters

Calculate Distance

Home Work / Interview Questions

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Home Work / Interview Questions

Încărcat de

Drepturi de autor:

Formate disponibile

R Programing

Rules of Accessing Data in R

1. DATAFRAME : SQUARE BRACKETS [ ]

1. P value of Linear Regression : How to use it?

1. Understanding the Beta values for the simple linear regression

1. When to use linear regression?

19th May 2018

Homework: ( Logistic Regression)

1. Lift and Gain Chart

Homework: ( Logistic Regression)

1. Lift and Gain Chart

Random Forest Tuning :

Homework : Apply on Test data

2nd June : Naive Bayes

1. How to predict on Naive Bayes

How Clustering Works

1. How to find optimal K

How to validate Clusters

S-ar putea să vă placă și