Are you passionate about gleaning insights from databases, organizing and analyzing large datasets to provide recommendations that help build product and website experience? Excellent communication and technical skills are essential in order to generate the right data and access information from various data sources. Here are some common interview questions frequently asked in data analyst interviews to assess the basic skills of the analysts.

## Questions on how to perform data analysis

List down the steps you follow for designing a data-driven model to address a business problem?

What steps will you take to check the sanctity of the data that you are analyzing?

Describe different pre-processing steps that you might carry out on data before using them to train a model.

How will you describe certain models as simple and others as complex? What are the relative strengths and weaknesses of choosing a more complex model over a simpler one?

How do you define ensemble models and what are some advantages of combining models?

What is dimensionality reduction? What are some ways to perform this and why is it important?

How can you overcome Overfitting?

Differentiate between wide and tall data formats?

## Questions on basic ideas in statistics, probability and machine learning

What is a confidence interval and why is it useful?

What is the difference between statistical independence and correlation?

What is conditional probability? What is Bayes’ Theorem? Why is it useful in practice?

Suppose we are training a model using a particular optimization procedure such as stochastic gradient descent. How do we know if we are converging to a solution? If a training procedure converges will it always result in the best possible solution?

How do we know if we have collected enough data to train a model?

Explain why we have training, test and validation data sets and how they are used effectively?

What is clustering? Give an example algorithm that performs clustering. How can we know whether we obtained decent clusters? How might we estimate a good number of clusters to use with our data?

We often say that correlation does not imply causation. What does this mean?

What is the difference between unsupervised and supervised learning?

Describe the concepts of regression and classification and the situations when they are used?

Explain the bias-variance trade-off in statistical models and how does it affect our data analysis?

What is over-fitting? How is this related to the bias-variance trade-off? What is regularization? Give some examples of regularization in models.

Suppose we want to train a binary classifier and one class is very rare. Give an example of such a problem. How should we train this model? What metrics should we use to measure performance?

How many unique subsets of n different objects can we make?

How would you build a data-driven recommender system? What are the limitations of this approach?

(Tools, visualization and presentation)

In which environment(s) do you usually run your analyses?

Describe your experience in working with data from databases. Are you familiar with SQL?

What visualization tools (Tableau, D3.js, R and so on) have you used?

Do you have a presentation you can show us, such as on SlideShare?

Do you have experience presenting reports and findings directly to senior management in your previous roles?

Are you comfortable speaking in public? Have you ever presented a technical topic to a large audience?