Preparing for interview. This is an ML Online Assessment for your ML skills.
This is a data science online assessment to help you get a sense of your level in basic data science interview questions.
Part 1: General ML Online Assessment
The first part of the Data Science Online Assessment is more about traditional data science.
1. In which situation are you more likely to overfit your model?
- A) Too much data
- B) Too little data
2. Which of the following is the not an assumption of linear regression?
- A) Linear relationship: a linear relationship between each independent variable, and the dependent variable
- B) Gaussian error: the residuals of the model are normal distributed
- C) Homoscedasticity: the residuals have constant variance across all range of the independent variables
- D) Independence: the residuals are independently distributed
- E) Collinearity: the x variables has a linear relationship with other
3. What is NoSQL?
- A) A Query Language that does not use SQL
- B) A kind of database that does not support SQL query
- C) A product by MongoDB
- D) A document datastore
- E) Not Only SQL
- F) Not SQL
- G) A kind of distributed database
4. Which statements are true about type 1 error?
- A) Also known as False Positive
- B) Also known as False Negative
- C) Can be reduced to 0 by adjust the decision threshold
- D) There is a trade off between type 1 error and recall
5. The confusion matrix is symmetric.
- A) True
- B) False
6. Which of the following is a regularization technique?
- A) LASSO
- B) ANOVA
- C) Gradient Descent
- D) L2
- E) Elastic Net
- F) PCA
7. If two feature variables (x1 and x2) are correlated, then you cannot use them for linear regression because of collinearity.
- A) True
- B) False
8. A medical test is testing whether a patient has cancer. If the null hypotheses is “the patient does not have cancer”, then which is Type 2 error?
- A) The test predicts cancer while the patient does not have cancer
- B) The test predicts no cancer while the patient has cancer
9. A correlation coefficient of 100% between x and y means that x has a causal relationship with y?
- A) True
- B) False
10. Which model has the fastest training time?
- A) K-Nearest Neighbor
- B) Logistic Regression
- C) Support Vector Machine
- D) Linear Regression
- E) Naive Bayes
11. Why do you need to scale numeric feature to the range [-1, 1]?
- A) For speed of convergence by gradient descent
- B) To balance L1 & L2 regularization among features
- C) Achieve highest floating point precision
- D) Tree models operate more efficiently at this range
- E) K-NN model would not behave well with Euclidian distance
12. Which of the following is not a method of feature scaling?
- A) Min-max scaling
- B) Clipping
- C) Z-score normalization
- D) Winsorizing
- E) Outlier dropping
13. When would you NOT engineer a hashed feature for categorical input?
- A) Unknown vocabulary
- B) High cardinality of the categorical input
- C) The feature is the gender of the person
- D) The feature is the hospital_id of medical sample
- E) Cold-start problem when new data comes during serving
14. Which of the following does NOT describe unsupervised learning?
- A) the data is not labeled
- B) PCA is a type of unsupervised learning method
- C) predict the price of a house based on the area, number of bedroom, etc
- D) finding customer segments from a population of customers
Part 2: Deep Learning Online Assessment
This part of the Data Science Online Assessment is about deep learning
1. What is stochastic about the Stochastic Gradient Descent?
- A) The data is random in each batch
- B) The gradient takes a random direction each step
- C) The magnitude of the descent is randomized
- D) The dimension of the descent is randomized
- E) The momentum of the descent is stochastic
2. Gradient descent searches for the global minimum
- A) True
- B) False
- C) Depends
3. What is the loss function for a multi-classification problem?
- A) Mean square error
- B) Soft-max error
- C) Cross-entropy loss
- D) Logistic loss
- E) Mean absolute error
4. You have a RGB Image from the previous layer with dimension (3, 100, 100). Which of the following is right dimension for output after a 32 channel 2D convolutional layer with filter size (5×5) with same padding?
- A) 320000
- B) 250000
- C) 2400
- D) 960000
5. Why do we use non-linear activation functions (such as tanhtanh or relurelu) in neural networks?
- A) Non-linear optimization is faster for convergence
- B) Most real world datasets are non-linear
- C) Stacking linear layers can only model linear relationship of the data which result in limited modeling capability
- D) The error measure MSE (mean squared error) is a non-linear metric
6. CNN question: A 10×10 grayscale image, we apply convolutional layer with 32 filters with kernel size 3×3, stride 1 and no zero padding (‘VALID’ padding), followed by a 2×2 pooling layer. What are the dimensions of the output?
- A) 5x5x32
- B) 4x4x32
- C) 4×4
- D) 8x8x32
7. How does increasing the stride affect the output of a convolutional layer?
- A) The output becomes smaller
- B) The output becomes bigger
- C) The output does not change
8. What does it mean when a dense layer of a neural network has no activation?
- A) The output of the layer is null value
- B) The network does not compile
- C) The output of the layer is disabled
- D) The output of the layer is linear
- E) The layer can be only used for classification by default
9. How many trainable parameters are in a flatten layer?
- A) Depends on the previous layer’s output dimension
- B) Depends on the input layer’s dimension
- C) Depends on the next layer’s input dimension
- D) Defined by the user when creating this layer
- E) No trainer parameter
10. In tensorflow, this layer “Conv2D(16, (3,3), activation=’relu’, input_shape=(28,28,1))”, why do you need the “1” in the last dimension of the input_shape?
- A) It does not matter, you can remove it since 28×28 is same as 28x28x1
- B) It is for the color dimension of an image
- C) Tensorflow’s Conv2D expects to perform convolution in 2D but over all a 3D volume
- D) It does not compile if the you leave out the “1”
11. Why are neural network rarely optimized with batch gradient descent (BGD) compared to stochastic gradient descent (SGD)?
- A) It takes too much memory because of the dataset size
- B) It does not converge to the global optimum
- C) It needs to be stochastic because data is stochastic
- D) It is too slow to converge compared to SGD
12. Given a dropout rate of 0.3 on a layer with 10 neurons, what is the probability that only one of the neurons is dropped out?
- A) 0.3
- B) 0.7
- C) 10*0.3*(0.7^9)
- D) 10*0.7*(0.3^9)
13. When should you use the Keras Functional API over the Sequential API?
- A) When the model is simple to expressed as a sequence of layers
- B) When you have multiple input layers
- C) When you have multiple output layers
- D) When your network topology requires shared layers
14. What is transfer learning?
- A) To reused knowledge learned by a pertained model on a related task
- B) To compute gradients of the loss function by transferring parameters from another model
- C) To adapter a regression model to a classification model
- D) To create a smaller model from a large model but still retaining most capabilities
15. What are some properties of RNN (recurrent neural network)?
- A) Size of the network increases with the sequence length
- B) Only works for time series data
- C) It can process sequential data of arbitrary length
- D) It carries hidden state from one time step to the next
16. What is LSTM trying to address from vanilla RNN?
- A) Allow RNN to process longer time series
- B) Solve the exploding gradient problem in RNN
- C) Solve the vanishing gradient problem in RNN
- D) To allow bidirectional learning on sequential data
17. Why is the output size of an RNN layer fixed while the input size can be flexible?
- A) Because the RNN sums up the hidden state from all time steps
- B) Because the RNN averages the hidden state from all time steps
- C) Because the RNN only return the hidden state as output from the last step
- D) Not possible, the output size of an RNN varies as the input size
18. What is the output dimension of the following Keras model?
Sequential([Embedding(1000, 16, input_length=128), LSTM(32)])
- A) (None, 128, 16)
- B) (None, 16)
- C) (None, 32)
- D) (None, None, 32)
19. What the ways to build neural network model in Keras?
- A) using the Keras Sequential API
- B) using the Keras Functional API
- C) using Model Subclassing
- D) using Layer Subclassing
20. What is the value of dy_dx below?
import tensorflow as tf
x = tf.constant([0, 1], dtype=tf.float32)
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.math.exp(x)
y = 2 * tf.reduce_sum(y)
dy_dx = tape.gradient(y, x)
- A) tf.Tensor([0], shape=(), dtype=float32)
- B) tf.Tensor([0.5, 2], shape=(2,), dtype=float32)
- C) tf.Tensor([2, 5.4365635], shape=(2,), dtype=float32)
- D) tf.Tensor([5.4365635], shape=(), dtype=float32)
21. What is aleatoric uncertainty?
- A) Data errors from sensor malfunction
- B) Object occlusion in an image segmentation task
- C) Using a model that underfits the problem