## Machine Learning Set 2

Question 1 |

Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments. What kind of learning problem is this?

A | Supervised learning |

B | Unsupervised learning |

C | Both (a) and (b) |

D | Neither (a) nor (b) |

Question 2 |

Define INFORMATION?

A | Data that has been interpreted and manipulated and has now some meaningful inference for the users |

B | DATA can be any unprocessed fact, value, text, sound or picture that is not being interpreted and analyzed |

C | Combination of inferred information, experiences, learning and insights. Results in awareness or concept building for an individual or organization |

D | All of the Above |

Question 3 |

Properties of Data_____

A | Volume,Variety |

B | Variety,Veracity |

C | value |

D | All of the Above |

Question 4 |

Regarding bias and variance, which of the following statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model.)

A | Models which underfit have a low variance |

B | Models which overfit have a high bias. |

C | Models which overfit have a low bias. |

D | Models which underfit have a high variance. |

Question 5 |

How we split data in Machine Learning?

A | Training Data |

B | Validation Data |

C | Testing Data |

D | All of the Above |

Question 6 |

What is meant by Validation Data?

A | Validation Data is part of data which is used to do a frequent evaluation of model, fit on training dataset along with improving involved hyperparameters |

B | Validation Data is the portion of the dataset used to train the model. |

C | Validation Data is the portion of the dataset used to test the trained model |

D | both a and b |

Question 7 |

Compared to the variance of the Maximum Likelihood Estimate (MLE), the variance of the Maximum A Posteriori (MAP) estimate is______

A | higher |

B | same |

C | lower |

D | it could be any of the above |

Question 8 |

Which of the following methods can achieve zero training error on any linearly separable dataset?

A | Decision tree |

B | Perceptron |

C | 15-nearest neighbors |

D | Logistic regression |

Question 9 |

What is meant by Training set?

A | Training set is the portion of the dataset used to test the trained model |

B | Training set is the portion of the dataset used to train the model. |

C | Training set is part of data which is used to do a frequent evaluation of model, fit on training dataset along with improving involved hyperparameters |

D | both a and b |

Question 10 |

What is meant by Test Set?

A | Testing set is the portion of the dataset used to train the model. |

B | Testing set is the portion of the dataset used to test the trained model |

C | Testing set is part of data which is used to do a frequent evaluation of model, fit on training dataset along with improving involved hyperparameters |

D | both a and b |

Question 11 |

As the number of training examples goes to infinity, your model trained on that data will have:

A | Lower variance |

B | Higher variance |

C | Same variance |

D | None of the above |

Question 12 |

The numerical output of a sigmoid node in a neural network:

A | Is unbounded, encompassing all real numbers. |

B | Is unbounded, encompassing all integers. |

C | Is bounded between 0 and 1. |

D | Is bounded between -1 and 1. |

Question 13 |

High entropy means that the partitions in classification are

A | pure |

B | not pure |

C | useful |

D | useless |

Question 14 |

Which of the following learning algorithms will return a classifier if the training data is not linearly separable?

A | Hard margin SVM |

B | Soft margin SVM |

C | Perceptron |

D | d) Naïve bayes |

Question 15 |

A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many maximum possible different examples are there?

A | 12 |

B | 24 |

C | 48 |

D | 72 |

Question 16 |

What would you do in PCA to get the same projection as SVD?

A | Transform data to zero mean |

B | Transform data to zero median |

C | Not possible |

D | None of these |

Question 17 |

Define KNOWLEDGE?

A | Combination of inferred information, experiences, learning and insights. Results in awareness or concept building for an individual or organization |

B | Data that has been interpreted and manipulated and has now some meaningful inference for the users |

C | DATA can be any unprocessed fact, value, text, sound or picture that is not being interpreted and analyzed |

D | All of the Above |

Question 18 |

Predicting on whether will it rain or not tomorrow evening at a particular time is a type of _________ problem.

A | Classification |

B | Regression |

C | Unsupervised learning |

D | All of the above |

Question 19 |

Which of the following best describes what discriminative approaches try to model? (w are the parameters in the model)

A | p(y|x, w) |

B | p(y, x) |

C | p(w|x, w) |

D | None of the above |

Question 20 |

Suppose we like to calculate P(H|E, F) and we have no conditional independence information. Which of the following sets of numbers are sufficient for the calculation?

A | P(E, F), P(H), P(E|H), P(F|H) |

B | P(E, F), P(H), P(E, F|H) |

C | P(H), P(E|H), P(F|H) |

D | P(E, F), P(E|H), P(F|H) |

Question 21 |

Which of the following is/are true regarding an SVM?

A | For two dimensional data points, the separating hyperplane learnt by a linear SVM will be a straight line. |

B | In theory, a Gaussian kernel SVM cannot model any complex separating hyperplane. |

C | For every kernel function used in a SVM, one can obtain an equivalent closed form basis expansion. |

D | Overfitting in an SVM is not a function of number of support vectors. |

Question 22 |

We split the given data set into____different sections

A | two |

B | three |

C | four |

D | five |

Question 23 |

In terms of the bias-variance trade-off, which of the following is substantially more harmful to the test error than the training error?

A | Bias |

B | Loss |

C | Variance |

D | Risk |

Question 24 |

Which of the following best describes the joint probability distribution P(X, Y, Z) for the given Bayes net.

A | P(X, Y, Z) = P(Y) * P(X|Y) * P(Z|Y) |

B | P(X, Y, Z) = P(X) * P(Y|X) * P(Z|Y) |

C | P(X, Y, Z) = P(Z) * P(X|Z) * P(Y|Z) |

D | P(X, Y, Z) = P(X) * P(Y) * P(Z) |

Question 25 |

The number of test examples needed to get statistically significant results should be________

A | Larger if the error rate is larger. |

B | Larger if the error rate is smaller. |

C | Smaller if the error rate is smaller. |

D | It does not matter. |

