Data Science Interview Questions

By ganpati | Interview Corner

Jan 20

Here are some other frequently asked Data Science Questions:


  • How will you compare two or more algorithms and decide which one is better?
  • Have you optimized an algorithm for speed? How, and by how much?
  • How will you choose between parallel processing and/or faster algorithms? Explain with examples.
  • How can you verify that an improvement you’ve brought to an algorithm is really an improvement?
  • How will you define a good clustering algorithm?
  • How would you improve a spam detection algorithm that uses naive Bayes?
  • What is Gradient Descent Method (the intuition is mostly enough)?
  • Which Clustering methods you are familiar with?
  • Statistics

  • You are given a data set. The data set has missing values which spread along 2 standard deviation from the median. What percentage of data would remain unaffected? Why?
  • What is the difference between covariance and correlation?
  • Is it possible capture the correlation between continuous and categorical variable? If yes, how?
  • Naïve Bayes

  • Explain prior probability, likelihood and marginal likelihood in context of naiveBayes algorithm?
  • You came to know that your model is suffering from low bias and high variance. Which algorithm should you use to tackle it? Why?
  • How is kNN different from kmeans clustering?
  • How is True Positive Rate and Recall related? Write the equation.
  • You were told that your regression model is suffering from multicollinearity. How would you check if that’s true? Without losing any information, can you still build a better model?
  • When is Ridge regression favorable over Lasso regression?
  • How would you select from two tree based algorithms? How is random forest different from Gradient boosting algorithm (GBM)?


  • How would you train and deploy a logistic regression model? A recommender system?
  • Which technique is used to predict categorical responses?

  • How would you monitor that the performance of a model you trained does not degrade over time?
  • What is the curse of dimensionality and how should one deal with it when building machine-learning models?
  • What’s more important: predictive power or interpretability of a model?
  • Explain to the company management what model lift is and why is it important.
  • Explain the statement: “Algorithm can be universal but not the model”.
  • What are Recommender Systems?
  • Why data cleaning plays a vital role in analysis?
  • Differentiate between univariate, bivariate and multivariate analysis.
  • What is power analysis?
  • What is Collaborative filtering?
  • What is the difference between Cluster and Systematic Sampling?
  • How can you assess a good logistic model?
  • How can you iterate over a list and also retrieve element indices at the same time?
  • Explain about the box cox transformation in regression models.
  • Write a function that takes in two sorted lists and outputs a sorted list that is their union.
  • What is the difference between Bayesian Inference and Maximum Likelihood Estimation (MLE)?
  • What is Regularization and what kind of problems does regularization solve?
  • What is multicollinearity and how you can overcome it?
  • What is the curse of dimensionality?
  • How do you decide whether your linear regression model fits the data?
  • What is the difference between squared error and absolute error?

Other Questions

  • Python or R – Which one would you prefer for text analytics?
  • What is P-Value ?
  • What is Regularization? Which problem does Regularization try to solve?
  • How you can fit a non-linear relations between X (say, Age) and Y (say, Income) into a Linear Model? – Show mathematically the marginal effect of X on Y based on their proposed solution.
  • What is the probability of getting a sum of 2 if I have 2 equally weighted dices? Now with 4? 7?
  • Which libraries for Analytics/DS you are familiar in Python?
  • Describe to me a Data Science project that you led/participated?
  • What is an eigenvalue? (linear algebra)
  • Time Series: if you have a data-set with 100 observations for each Xi, and 3 lag-effect variables of X1, how many predictions you will have if you will run any simple linear regression?

Some Important Resources for Data Science Interviews

And now that you have spent days preparing for the interviews you’re ready to put yourself on the job market. Thankfully, a ton of folks have written about their experiences interviewing for data science roles.

  • Crushed it! Landing a data science job (Erin Shellman)
  • How to Land a High-Paying Data Science Job (Even If You Have the Wrong Background) Minda Zetlin
  • What it’s like to be on the data science job market (Trey Causey)
  • Building a data science portfolio: Storytelling with data (
  • 5 secrets for writing the perfect data scientist resume (The Data Incubator)
  • [VIDEO] What it’s Like to Interview as a Data Scientist (Dose of Data)
  • [VIDEO] Lessons Learned the Hard Way: Hacking the Data Science Interview (Galvanize)

About the Author

Leave a Comment:

Leave a Comment: