0 evaluări0% au considerat acest document util (0 voturi)
477 vizualizări3 pagini
The document contains questions about statistics, machine learning, data science and Python concepts. It asks about topics like types of data, machine learning algorithms and techniques, and Python programming.
The document contains questions about statistics, machine learning, data science and Python concepts. It asks about topics like types of data, machine learning algorithms and techniques, and Python programming.
The document contains questions about statistics, machine learning, data science and Python concepts. It asks about topics like types of data, machine learning algorithms and techniques, and Python programming.
candidate in a fourthcoming election by? Conduct a poll of random sample from TPS in the country 2. Which one of the following is not an example of statistics ? Gini Index 3. Which of the following is an example of time series data ? Average batting average of a baseball player, 2 dan 3 4. Which of the following is an example of multivariate data ? Vital signs recorded for a new born baby 5. Which of the following is not an example of big data ? The number of football players in FIFA 6. Which of the following is an example of categorical data ? Mode of fashion in a certain year 7. Which of the following is not an example of ordinal data ? Number of trees in a park 8. A mean is meaningful for the following type of data ? Ratio Data 9. You have two datasets. First 1000 customers, second 1500. Dow datasets have 900 recorda in common. 900 records 10. Consider the dataframe “df”, what does the command df.rename(columns=(‘a’:’b’)) change about dataframe “df” ? Nothing as you must set the parameter “inplace=True” 11. Consider the column of the dataframe df[‘a’]. the column tas been standardized. What is the standard deviation of the values, l.e. the results of applying the following operation df[‘a’].std() ? 1 12. What is the Pearson Correlation between variables X and Y, if X =0.9 * Y ? 1 13. Consider the dataframe “df”, with categorical columdfn “categories”. What would be the output of this following command [‘categories’].value_counts()[:20].index.tolist() ? A python list showing the top 20 categories that appears the most without its number of occurrences 14. Based on the data frame sample below, write a Pandas program to creat and display a Dataframe from a specified dictionary data which has the index labels. Example DataFrame : exam_data=(‘nama’:[‘anastasia’,’dirna’,’katherine’,’james’,’emily’…) df = pd.DataFrame(exam_data, index=labels) print (df) 15. Based on sample below, write a Pandas program to append a new row to DataFrame with given values for each column. Now delete the new row and return the original data frame. Example DataFrame: exam_data=(‘name’:[‘Anastasia’,’Dirna’,’katherine’,’james’ df.loc[‘k’]=[1,’Suresh’,’yes’, 15.5) 16. You surmise that the two arrays must have the same space allocated ? Print flags of both arrays by e.flags and f.flags; check the flag “OWNDATA’. If one of them false, then both the arrays have same space allocated 17. Suppose you want to join train and test dataset (both are two numpy ways train_set and test_set) into a resulting array (resulting_set) resulting_set=np.vstack([train_set, test_set]) 18. Which command will be appropriate to fill missing value while reading the file with numpy ? filling_values=(“*”, 0, 01/01/2010, 0) temp = np.genfromtxt (filename, filling_values=filling_values) 19. Which is the following a preferred measure of central tendency given the data is severely skewed. Median 20. Median represents a value in the data set where: Half of the observations are above the median and the other half below it 21. The following is the right statement about Numpy: Numpy is a library for the Python programming language, adding support for large, multi-dimesional arrays and matrices, along with a large collection of high-level mathematical functions to operate on the arrays 22. What is the result of the following operation in Python: 3 + 2 * 2 ? 7 23. In Python, if you executed var =;1234567’, what would be the result of print var[::2])? 0246 24. In Python, what is the result of the following operation ‘1’ + ‘2’ ? ‘12’ 25. Given myWord = ‘hello’, how would you convert myWorld into uppercase ? myWord.upper() 26. After applying the following method,l.append((‘a’,’b’)_, the following list will only be one element longer. True 27. What is an important difference between lists and tuples ? list are mutable, tuples are not 28. Dict=(“a”:1,”b”:”2”,”c”:[3,3,3], “D”:(4,4,4) ………… what is the result of the following operation : Dict[“D”]. (4,4,4) 29. What is the correct way to sort the list ‘myData’ using a method, the result should not return a new list, just change the list ‘myData’. myData.sort() 30. What are the keys of the of the following {‘a’:1,’b’:2}. A,b 31. Which one of the following statements best describes the Python scikit library ? a collection of algorithms and tools for machine learning 32. Supposed a media content website wants to improve their customer experience by providing recommendation system that will generally tell them what’s the popular content among their neighbour that they might also like it. Collaborative filtering recommender system 33. In comparison to supervised learning, unsupervised learing has ? Less test (evaluation, approachers) 34. Which one of the following statements is the most accurate ? machine learning is the branch of AI that covers the statistical and learning part of AI 35. What would be the result of list2 from the following code: array ([1, 12, 3, 4]) 36. What do the following lines of code do ? read the file “exercise.txt” 37. Is the result of applying the following method df.head() ………. Print the first 5 row of the dataframe 38. Consider the dataframe “df”, with categorical column “categories”. A python list showing the top 20 categories that appears the most without its number of occurrences 39. Consider the following dataframe: the average price for each body style 40. You want to predict a field umber CHURN ………. CHURN as target field; AGE, GENDER, and HOUSEHOLD SIZE as input fields 41. An insurance company has a dataset in place storing information about claims. One field in the dataset flags whether the claim was fraudulent or not ……….. a classification model 42. Kevin is the head of the spare-parts inventory warehouse. Association 43. Imagine, you are solvinf a classification problems with highly imbalanced class. Accuracy metric is not a good ideas for imbalanced class problems & Precision and recall metrics are good for imbalanced class problems 44. When you heve a very high bias model and you have tried many algorithm with its parameters/ Feature engineering 45. Imagine you run a binary classification usin random forest. SHAP 46. Imagine you have many features with different range value, and you want to make prediction using lasso linear regression. Data normalization 47. Which of the following sentence that is TRUE about Decision Tree ? it can easily overfit as the tree goes deeper 48. Which of the following sentence that is TRUE about Random Forest ? Each tree has the same amooount pf say on determining di final result 49. Decision tree has been regaeded for simplicity and popularity in machine learning. Creating the tree based on information gain 50. On the course of holidat, she is very pumped to try taking a coffe in the café as she never had that experience before. Better interpretation compared with decision tree 51. Ensamble learning can work well in condition that requirement down belos is/are fulfilled. Independen Model 52. Which of the following correctly describes the relationship between FS complexity and f’s bias and variance terms ? as the complexity of f’s increases, the bias term decreases while the variance term increases 53. XGBoost is a powerful library that scales very well to many samples and works for variety. Predicting the likehood that a given users will click an ad from a very large clickstrea logi with million of users and their web imteractions 54. For many cases, imbalaced dataset is nerarly inavitable. Overfitting 55. An online relailer wants to identify groups of customers. Segmentation 56. A mysterious disease causing uncontrollable. 90% 57. Given the type node shown below, what spesifications are necessary ….. Specify values lower 1 and upper 15 and check action nullify
58. Which of the following statements is true about data science ?
59. Which of the following statements is true about different between data science and data analysis ?