Machine Learning Interview Questions And Answers

Q.1.What is Machine learning?
A. Machine learning is a branch of computer science which deals with system programming in order to automatically learn and improve with experience. For example: Robots are programed so that they can perform the task based on data they gather from sensors. It automatically learns programs from data.
Q.2.Mention the difference between Data Mining and Machine learning?
A. Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns.During this process machine, learning algorithms are used.
Q.3.What is ‘Overfitting’ in Machine learning?
A. In machine learning, when a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfit.
Q.4.What is inductive machine learning?
A. The inductive machine learning involves the process of learning by examples, where a system, from a set of observed instances tries to induce a general rule.
Q.5.What are the different Algorithm techniques in Machine Learning?
A. The different types of techniques in Machine Learning are

  • Supervised Learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning
  • Transduction
  • Learning to Learn
Q.6.What are the three stages to build the hypotheses or model in machine learning?

  • Model building
  • Model testing
  • Applying the model
Q.7. What is ‘Training set’ and ‘Test set’?
A.  In various areas of information science like machine learning, a set of data is used to discover the potentially predictive relationship known as ‘Training Set’. Training set is an examples given to the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner. Training set are distinct from Test set.
Q.8. List down various approaches for machine learning?
A.  The different approaches in Machine Learning are

  • Concept Vs Classification Learning
  • Symbolic Vs Statistical Learning
  • Inductive Vs Analytical Learning
Q.9.Explain what is the function of ‘Unsupervised Learning’?

  • Find clusters of the data
  • Find low-dimensional representations of the data
  • Find interesting directions in data
  • Interesting coordinates and correlations
  • Find novel observations/ database cleaning
Q.10.What is algorithm independent machine learning?
A. Machine learning in where mathematical foundations is independent of any particular classifier or learning algorithm is referred as algorithm independent machine learning?
Q.11.What is the difference between artificial learning and machine learning?
A. Designing and developing algorithms according to the behaviours based on empirical data are known as Machine Learning. While artificial intelligence in addition to machine learning, it also covers other aspects like knowledge representation, natural language processing, planning, robotics etc.
Q.12. What is Model Selection in Machine Learning?
A. The process of selecting models among different mathematical models, which are used to describe the same data set is known as Model Selection. Model selection is applied to the fields of statistics, machine learning and data mining.
Q.13. What is the difference between heuristic for rule learning and heuristics for decision trees?
A. The difference is that the heuristics for decision trees evaluate the average quality of a number of disjointed sets while rule learners only evaluate the quality of the set of instances that is covered with the candidate rule.
Q.14. Explain the two components of Bayesian logic program?
A. Bayesian logic program consists of two components. The first component is a logical one ; it consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain. The second component is a quantitative one, it encodes the quantitative information about the domain.
Q.15.Explain the two components of Bayesian logic program?
A. The two paradigms of ensemble methods are

  • Sequential ensemble methods
  • Parallel ensemble methods
Q.16.What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?
A. The general principle of an ensemble method is to combine the predictions of several models built with a given learning algorithm in order to improve robustness over a single model. Bagging is a method in ensemble for improving unstable estimation or classification schemes. While boosting method are used sequentially to reduce the bias of the combined model. Boosting and Bagging both can reduce errors by reducing the variance term.
Q.17.What is an Incremental Learning algorithm in ensemble?
A. Incremental learning method is the ability of an algorithm to learn from new data that may be available after classifier has already been generated from already available dataset.
Q.18.What is PCA, KPCA and ICA used for?
A. PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component Analysis) and ICA ( Independent Component Analysis) are important feature extraction techniques used for dimensionality reduction.
Q.20.What are the different categories you can categorized the sequence learning process?

  • Sequence prediction
  • Sequence generation
  • Sequence recognition
  • Sequential decision
Q.21.What is dimension reduction in Machine Learning?
A. In Machine Learning and statistics, dimension reduction is the process of reducing the number of random variables under considerations and can be divided into feature selection and feature extraction
Q.22.Explain the difference between KNN and k.means clustering?
A. K-Nearest Neighbours is a supervised machine learning algorithm where we need to provide the labelled data to the model it then classifies the points based on the distance of the point from the nearest points.
Whereas, on the other hand, K-Means clustering is an unsupervised machine learning algorithm thus we need to provide the model with unlabelled data and this algorithm classifies points into clusters based on the mean of the distances between different points
Q.23.Explain the difference between KNN and k.means clustering?
A. Classification is used to produce discrete results, classification is used to classify data into some specific categories .for example classifying e-mails into spam and non-spam categories.
Whereas, We use regression analysis when we are dealing with continuous data, for example predicting stock prices at a certain point of time.
Q.24.How to ensure that your model is not overfitting?
A. Keep the design of the model simple. Try to reduce the noise in the model by considering fewer variables and parameters.
Cross-validation techniques such as K-folds cross validation help us keep overfitting under control.
Regularization techniques such as LASSO help in avoiding overfitting by penalizing certain parameters if they are likely to cause overfitting.
Q.25.List the main advantage of Navie Bayes?
A. A Naive Bayes classifier converges very quickly as compared to other models like logistic regression. As a result, we need less training data in case of naive Bayes classifier.
Q.26.What should you do when your model is suffering from low bias and high variance?
A. When the model’s predicted value is very close to the actual value the condition is known as low bias. In this condition, we can use bagging algorithms like random forest regressor.
Q.27.Explain differences between random forest and gradient boosting algorithm.
A. Random forest uses bagging techniques whereas GBM uses boosting techniques.
Random forests mainly try to reduce variance and GBM reduces both bias and variance of a model.
Q.28.How Do You Handle Missing or Corrupted Data in a Dataset?
A. One of the easiest ways to handle missing or corrupted data is to drop those rows or columns or replace them entirely with some other value.There are two useful methods in Pandas:

  • IsNull() and dropna() will help to find the columns/rows with missing data and drop them
  • Fillna() will replace the wrong values with a placeholder value
Q.29.How Can You Choose a Classifier Based on a Training Set Data Size?
A. When the training set is small, a model that has a right bias and low variance seems to work better because they are less likely to overfit.For example, Naive Bayes works best when the training set is large. Models with low bias and high variance tend to perform better as they work fine with complex relationships.
Q.30.What Is Deep Learning?
A. Deep learning is a subset of machine learning that involves systems that think and learn like humans using artificial neural networks. The term ‘deep’ comes from the fact that you can have several layers of neural networks.One of the primary differences between machine learning and deep learning is that feature engineering is done manually in machine learning. In the case of deep learning, the model consisting of neural networks will automatically determine which features to use (and which not to use).
Q.31.What Are the Applications of Supervised Machine Learning in Modern Businesses?
A. Applications of supervised machine learning include:

  • Email Spam Detection

    Here we train the model using historical data that consists of emails categorized as spam or not spam. This labeled information is fed as input to the model.

  • Healthcare Diagnosis

    By providing images regarding a disease, a model can be trained to detect if a person is suffering from the disease or not.

  • Sentiment Analysis

    This refers to the process of using algorithms to mine documents and determine whether they’re positive, neutral, or negative in sentiment.

  • Fraud Detection

    Training the model to identify suspicious patterns, we can detect instances of possible fraud.

Q.32.What Is Semi-supervised Machine Learning?
A. Supervised learning uses data that is completely labeled, whereas unsupervised learning uses no training data.In the case of semi-supervised learning, the training data contains a small amount of labeled data and a large amount of unlabeled data.
Q.33.What Is the Difference Between Supervised and Unsupervised Machine Learning?

  • Supervised learning – This model learns from the labeled data and makes a future prediction as output
  • Unsupervised learning – This model uses unlabeled input data and allows the algorithm to act on that information without guidance.
Q.34.What Is ‘naive’ in the Naive Bayes Classifier?
A. The classifier is called ‘naive’ because it makes assumptions that may or may not turn out to be correct.The algorithm assumes that the presence of one feature of a class is not related to the presence of any other feature (absolute independence of features), given the class variable.

For instance, a fruit may be considered to be a cherry if it is red in color and round in shape, regardless of other features. This assumption may or may not be right (as an apple also matches the description).

Q.35.How Will You Know Which Machine Learning Algorithm to Choose for Your Classification Problem?
A. While there is no fixed rule to choose an algorithm for a classification problem, you can follow these guidelines:

  • If accuracy is a concern, test different algorithms and cross-validate them
  • If the training dataset is small, use models that have low variance and high bias
  • If the training dataset is large, use models that have high variance and little bias