Preparing for interview. This is an ML Online Assessment for your ML skills.

This is a data science online assessment to help you get a sense of your level in basic data science interview questions.

## Part 1: General ML Online Assessment

The first part of the Data Science Online Assessment is more about traditional data science.

#### 1. In which situation are you more likely to overfit your model?

- A) Too much data
- B) Too little data

#### 2. Which of the following is the not an assumption of linear regression?

- A) Linear relationship: a linear relationship between each independent variable, and the dependent variable
- B) Gaussian error: the residuals of the model are normal distributed
- C) Homoscedasticity: the residuals have constant variance across all range of the independent variables
- D) Independence: the residuals are independently distributed
- E) Collinearity: the x variables has a linear relationship with other

#### 3. What is NoSQL?

- A) A Query Language that does not use SQL
- B) A kind of database that does not support SQL query
- C) A product by MongoDB
- D) A document datastore
- E) Not Only SQL
- F) Not SQL
- G) A kind of distributed database

#### 4. Which statements are true about type 1 error?

- A) Also known as False Positive
- B) Also known as False Negative
- C) Can be reduced to 0 by adjust the decision threshold
- D) There is a trade off between type 1 error and recall

#### 5. The confusion matrix is symmetric.

- A) True
- B) False

#### 6. Which of the following is a regularization technique?

- A) LASSO
- B) ANOVA
- C) Gradient Descent
- D) L2
- E) Elastic Net
- F) PCA

#### 7. If two feature variables (x1 and x2) are correlated, then you cannot use them for linear regression because of collinearity.

- A) True
- B) False

#### 8. A medical test is testing whether a patient has cancer. If the null hypotheses is “the patient does not have cancer”, then which is Type 2 error?

- A) The test predicts cancer while the patient does not have cancer
- B) The test predicts no cancer while the patient has cancer

#### 9. A correlation coefficient of 100% between x and y means that x has a causal relationship with y?

- A) True
- B) False

#### 10. Which model has the fastest training time?

- A) K-Nearest Neighbor
- B) Logistic Regression
- C) Support Vector Machine
- D) Linear Regression
- E) Naive Bayes

#### 11. Why do you need to scale numeric feature to the range [-1, 1]?

- A) For speed of convergence by gradient descent
- B) To balance L1 & L2 regularization among features
- C) Achieve highest floating point precision
- D) Tree models operate more efficiently at this range
- E) K-NN model would not behave well with Euclidian distance

#### 12. Which of the following is not a method of feature scaling?

- A) Min-max scaling
- B) Clipping
- C) Z-score normalization
- D) Winsorizing
- E) Outlier dropping

#### 13. When would you NOT engineer a hashed feature for categorical input?

- A) Unknown vocabulary
- B) High cardinality of the categorical input
- C) The feature is the gender of the person
- D) The feature is the hospital_id of medical sample
- E) Cold-start problem when new data comes during serving

#### 14. Which of the following does NOT describe unsupervised learning?

- A) the data is not labeled
- B) PCA is a type of unsupervised learning method
- C) predict the price of a house based on the area, number of bedroom, etc
- D) finding customer segments from a population of customers

## Part 2: Deep Learning Online Assessment

This part of the Data Science Online Assessment is about deep learning

#### 1. What is stochastic about the Stochastic Gradient Descent?

- A) The data is random in each batch
- B) The gradient takes a random direction each step
- C) The magnitude of the descent is randomized
- D) The dimension of the descent is randomized
- E) The momentum of the descent is stochastic

#### 2. Gradient descent searches for the global minimum

- A) True
- B) False
- C) Depends

#### 3. What is the loss function for a multi-classification problem?

- A) Mean square error
- B) Soft-max error
- C) Cross-entropy loss
- D) Logistic loss
- E) Mean absolute error

#### 4. You have a RGB Image from the previous layer with dimension (3, 100, 100). Which of the following is right dimension for output after a 32 channel 2D convolutional layer with filter size (5×5) with same padding?

- A) 320000
- B) 250000
- C) 2400
- D) 960000

#### 5. Why do we use non-linear activation functions (such as tanhtanh or relurelu) in neural networks?

- A) Non-linear optimization is faster for convergence
- B) Most real world datasets are non-linear
- C) Stacking linear layers can only model linear relationship of the data which result in limited modeling capability
- D) The error measure MSE (mean squared error) is a non-linear metric

#### 6. CNN question: A 10×10 grayscale image, we apply convolutional layer with 32 filters with kernel size 3×3, stride 1 and no zero padding (â€˜VALIDâ€™ padding), followed by a 2×2 pooling layer. What are the dimensions of the output?

- A) 5x5x32
- B) 4x4x32
- C) 4×4
- D) 8x8x32

#### 7. How does increasing the stride affect the output of a convolutional layer?

- A) The output becomes smaller
- B) The output becomes bigger
- C) The output does not change

#### 8. What does it mean when a dense layer of a neural network has no activation?

- A) The output of the layer is null value
- B) The network does not compile
- C) The output of the layer is disabled
- D) The output of the layer is linear
- E) The layer can be only used for classification by default

#### 9. How many trainable parameters are in a flatten layer?

- A) Depends on the previous layer’s output dimension
- B) Depends on the input layer’s dimension
- C) Depends on the next layer’s input dimension
- D) Defined by the user when creating this layer
- E) No trainer parameter

#### 10. In tensorflow, this layer “Conv2D(16, (3,3), activation=’relu’, input_shape=(28,28,1))”, why do you need the “1” in the last dimension of the input_shape?

- A) It does not matter, you can remove it since 28×28 is same as 28x28x1
- B) It is for the color dimension of an image
- C) Tensorflow’s Conv2D expects to perform convolution in 2D but over all a 3D volume
- D) It does not compile if the you leave out the “1”

#### 11. Why are neural network rarely optimized with batch gradient descent (BGD) compared to stochastic gradient descent (SGD)?

- A) It takes too much memory because of the dataset size
- B) It does not converge to the global optimum
- C) It needs to be stochastic because data is stochastic
- D) It is too slow to converge compared to SGD

#### 12. Given a dropout rate of 0.3 on a layer with 10 neurons, what is the probability that only one of the neurons is dropped out?

- A) 0.3
- B) 0.7
- C) 10*0.3*(0.7^9)
- D) 10*0.7*(0.3^9)

#### 13. When should you use the Keras Functional API over the Sequential API?

- A) When the model is simple to expressed as a sequence of layers
- B) When you have multiple input layers
- C) When you have multiple output layers
- D) When your network topology requires shared layers

#### 14. What is transfer learning?

- A) To reused knowledge learned by a pertained model on a related task
- B) To compute gradients of the loss function by transferring parameters from another model
- C) To adapter a regression model to a classification model
- D) To create a smaller model from a large model but still retaining most capabilities

#### 15. What are some properties of RNN (recurrent neural network)?

- A) Size of the network increases with the sequence length
- B) Only works for time series data
- C) It can process sequential data of arbitrary length
- D) It carries hidden state from one time step to the next

#### 16. What is LSTM trying to address from vanilla RNN?

- A) Allow RNN to process longer time series
- B) Solve the exploding gradient problem in RNN
- C) Solve the vanishing gradient problem in RNN
- D) To allow bidirectional learning on sequential data

#### 17. Why is the output size of an RNN layer fixed while the input size can be flexible?

- A) Because the RNN sums up the hidden state from all time steps
- B) Because the RNN averages the hidden state from all time steps
- C) Because the RNN only return the hidden state as output from the last step
- D) Not possible, the output size of an RNN varies as the input size

#### 18. What is the output dimension of the following Keras model?

Sequential([Embedding(1000, 16, input_length=128), LSTM(32)])

- A) (None, 128, 16)
- B) (None, 16)
- C) (None, 32)
- D) (None, None, 32)

#### 19. What the ways to build neural network model in Keras?

- A) using the Keras Sequential API
- B) using the Keras Functional API
- C) using Model Subclassing
- D) using Layer Subclassing

#### 20. What is the value of dy_dx below?

import tensorflow as tf

x = tf.constant([0, 1], dtype=tf.float32)

with tf.GradientTape() as tape:

tape.watch(x)

y = tf.math.exp(x)

y = 2 * tf.reduce_sum(y)

dy_dx = tape.gradient(y, x)

- A) tf.Tensor([0], shape=(), dtype=float32)
- B) tf.Tensor([0.5, 2], shape=(2,), dtype=float32)
- C) tf.Tensor([2, 5.4365635], shape=(2,), dtype=float32)
- D) tf.Tensor([5.4365635], shape=(), dtype=float32)

#### 21. What is aleatoric uncertainty?

- A) Data errors from sensor malfunction
- B) Object occlusion in an image segmentation task
- C) Using a model that underfits the problem