# ML Online Assessment

Preparing for interview. This is an ML Online Assessment for your ML skills.

This is a data science online assessment to help you get a sense of your level in basic data science interview questions.

## Part 1: General ML Online Assessment

The first part of the Data Science Online Assessment is more about traditional data science.

#### 1. In which situation are you more likely to overfit your model?

• A) Too much data
• B) Too little data

#### 2. Which of the following is the not an assumption of linear regression?

• A) Linear relationship: a linear relationship between each independent variable, and the dependent variable
• B) Gaussian error: the residuals of the model are normal distributed
• C) Homoscedasticity: the residuals have constant variance across all range of the independent variables
• D) Independence: the residuals are independently distributed
• E) Collinearity: the x variables has a linear relationship with other

#### 3. What is NoSQL?

• A) A Query Language that does not use SQL
• B) A kind of database that does not support SQL query
• C) A product by MongoDB
• D) A document datastore
• E) Not Only SQL
• F) Not SQL
• G) A kind of distributed database

#### 4. Which statements are true about type 1 error?

• A) Also known as False Positive
• B) Also known as False Negative
• C) Can be reduced to 0 by adjust the decision threshold
• D) There is a trade off between type 1 error and recall

• A) True
• B) False

#### 6. Which of the following is a regularization technique?

• A) LASSO
• B) ANOVA
• D) L2
• E) Elastic Net
• F) PCA

• A) True
• B) False

#### 8. A medical test is testing whether a patient has cancer. If the null hypotheses is “the patient does not have cancer”, then which is Type 2 error?

• A) The test predicts cancer while the patient does not have cancer
• B) The test predicts no cancer while the patient has cancer

• A) True
• B) False

#### 10. Which model has the fastest training time?

• A) K-Nearest Neighbor
• B) Logistic Regression
• C) Support Vector Machine
• D) Linear Regression
• E) Naive Bayes

#### 11. Why do you need to scale numeric feature to the range [-1, 1]?

• A) For speed of convergence by gradient descent
• B) To balance L1 & L2 regularization among features
• C) Achieve highest floating point precision
• D) Tree models operate more efficiently at this range
• E) K-NN model would not behave well with Euclidian distance

#### 12. Which of the following is not a method of feature scaling?

• A) Min-max scaling
• B) Clipping
• C) Z-score normalization
• D) Winsorizing
• E) Outlier dropping

#### 13. When would you NOT engineer a hashed feature for categorical input?

• A) Unknown vocabulary
• B) High cardinality of the categorical input
• C) The feature is the gender of the person
• D) The feature is the hospital_id of medical sample
• E) Cold-start problem when new data comes during serving

#### 14. Which of the following does NOT describe unsupervised learning?

• A) the data is not labeled
• B) PCA is a type of unsupervised learning method
• C) predict the price of a house based on the area, number of bedroom, etc
• D) finding customer segments from a population of customers

## Part 2: Deep Learning Online Assessment

This part of the Data Science Online Assessment is about deep learning

• A) The data is random in each batch
• B) The gradient takes a random direction each step
• C) The magnitude of the descent is randomized
• D) The dimension of the descent is randomized
• E) The momentum of the descent is stochastic

• A) True
• B) False
• C) Depends

#### 3. What is the loss function for a multi-classification problem?

• A) Mean square error
• B) Soft-max error
• C) Cross-entropy loss
• D) Logistic loss
• E) Mean absolute error

• A) 320000
• B) 250000
• C) 2400
• D) 960000

#### 5. Why do we use non-linear activation functions (such as tanhtanh or relurelu) in neural networks?

• A) Non-linear optimization is faster for convergence
• B) Most real world datasets are non-linear
• C) Stacking linear layers can only model linear relationship of the data which result in limited modeling capability
• D) The error measure MSE (mean squared error) is a non-linear metric

• A) 5x5x32
• B) 4x4x32
• C) 4×4
• D) 8x8x32

#### 7. How does increasing the stride affect the output of a convolutional layer?

• A) The output becomes smaller
• B) The output becomes bigger
• C) The output does not change

#### 8. What does it mean when a dense layer of a neural network has no activation?

• A) The output of the layer is null value
• B) The network does not compile
• C) The output of the layer is disabled
• D) The output of the layer is linear
• E) The layer can be only used for classification by default

#### 9. How many trainable parameters are in a flatten layer?

• A) Depends on the previous layer’s output dimension
• B) Depends on the input layer’s dimension
• C) Depends on the next layer’s input dimension
• D) Defined by the user when creating this layer
• E) No trainer parameter

#### 10. In tensorflow, this layer “Conv2D(16, (3,3), activation=’relu’, input_shape=(28,28,1))”, why do you need the “1” in the last dimension of the input_shape?

• A) It does not matter, you can remove it since 28×28 is same as 28x28x1
• B) It is for the color dimension of an image
• C) Tensorflow’s Conv2D expects to perform convolution in 2D but over all a 3D volume
• D) It does not compile if the you leave out the “1”

#### 11. Why are neural network rarely optimized with batch gradient descent (BGD) compared to stochastic gradient descent (SGD)?

• A) It takes too much memory because of the dataset size
• B) It does not converge to the global optimum
• C) It needs to be stochastic because data is stochastic
• D) It is too slow to converge compared to SGD

#### 12. Given a dropout rate of 0.3 on a layer with 10 neurons, what is the probability that only one of the neurons is dropped out?

• A) 0.3
• B) 0.7
• C) 10*0.3*(0.7^9)
• D) 10*0.7*(0.3^9)

#### 13. When should you use the Keras Functional API over the Sequential API?

• A) When the model is simple to expressed as a sequence of layers
• B) When you have multiple input layers
• C) When you have multiple output layers
• D) When your network topology requires shared layers

#### 14. What is transfer learning?

• A) To reused knowledge learned by a pertained model on a related task
• B) To compute gradients of the loss function by transferring parameters from another model
• C) To adapter a regression model to a classification model
• D) To create a smaller model from a large model but still retaining most capabilities

#### 15. What are some properties of RNN (recurrent neural network)?

• A) Size of the network increases with the sequence length
• B) Only works for time series data
• C) It can process sequential data of arbitrary length
• D) It carries hidden state from one time step to the next

#### 16. What is LSTM trying to address from vanilla RNN?

• A) Allow RNN to process longer time series
• B) Solve the exploding gradient problem in RNN
• C) Solve the vanishing gradient problem in RNN
• D) To allow bidirectional learning on sequential data

#### 17. Why is the output size of an RNN layer fixed while the input size can be flexible?

• A) Because the RNN sums up the hidden state from all time steps
• B) Because the RNN averages the hidden state from all time steps
• C) Because the RNN only return the hidden state as output from the last step
• D) Not possible, the output size of an RNN varies as the input size

#### 18. What is the output dimension of the following Keras model?

Sequential([Embedding(1000, 16, input_length=128), LSTM(32)])

• A) (None, 128, 16)
• B) (None, 16)
• C) (None, 32)
• D) (None, None, 32)

#### 19. What the ways to build neural network model in Keras?

• A) using the Keras Sequential API
• B) using the Keras Functional API
• C) using Model Subclassing
• D) using Layer Subclassing

#### 20. What is the value of dy_dx below?

import tensorflow as tf

x = tf.constant([0, 1], dtype=tf.float32)

tape.watch(x)
y = tf.math.exp(x)
y = 2 * tf.reduce_sum(y)

• A) tf.Tensor([0], shape=(), dtype=float32)
• B) tf.Tensor([0.5, 2], shape=(2,), dtype=float32)
• C) tf.Tensor([2, 5.4365635], shape=(2,), dtype=float32)
• D) tf.Tensor([5.4365635], shape=(), dtype=float32)

#### 21. What is aleatoric uncertainty?

• A) Data errors from sensor malfunction
• B) Object occlusion in an image segmentation task
• C) Using a model that underfits the problem