Here are 5 steps beginners should follow to become a data analytics maestro:
First decide about what sort of problems interest you the most. Analytics is not just about crunching numbers. It consists of capturing the essence of numbers. An increasing stock value for example may be good for a long trader and bad for short trader. There is no absolute in this field.
So it’s important to pick up a subject that interests you and start exploring data and ask questions. The deeper you go into the data the more insights you’ll derive out of it. Keep jotting down your thoughts and keep revisiting them. Before jumping into any model get familiar with as much data as you can.
Data lies everywhere, in everything we think, perform or don’t perform. And data can be deceptive. Data is a good servant but a bad master. Be in command of your data, don’t let your data drive you. Don’t let data to take control of your decisions. Develop your perpective, create a model based on that perspective, fit the data into that model and then draw your conclusion.
Here is an example to demonstrate how same data can be used to derive diametrically opposite conclusion. It will also show you how having a clear objective can save you from getting lost in the ocean of data. Here is how you may start with the data analysis.
Think of as many ways of breaking or combining different fields as possible. Think of creative solutions in interpreting and applying the data and share your problems in a community or discussion forum. That’s how your ideas will evolve. See this example that describes how predictions can be made with limited data by being creative with data. Don’t be paranoid about the simplicity of your model. Best solutions are often the simplest ones.
No model is perfect. Even world class companies with strong analytics teams look for solutions in forums like Kaggle in order to get better in their game. Once you have shortlisted 4-5 approaches for making predictions select one that you think is the best and start refining it.
Often the models that produce the best results are ensemble models- which are nothing but combination of multiple models.
The best model is one that is the most generic. A simple way to avoid overfitting is by asking questions. Don’t expect your models to produce answers. Use them as a tool to create more questions and you’ll be able to avoid overfitting. Here’s how we overfit the data in our quest to squeeze the maximum out of it and the steps to avoid it.