4 Steps in Predictive Analytics with Kaggle

By ganpati | Getting Started

Dec 01

What is Kaggle?

Kaggle is a platform for predictive modeling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models.

For me Kaggle is not just about competitions but also about learning. Every competition has some new insight to offer. Below are some

Advantages of participating in Kaggle:

You get to know about newer ways to analyze the data.

There is a vibrant forum of Kagglers who share ideas about a problem. If have any query specific to a problem you can get it answered here.

It also challenges you to refine your prediction continuously. Even when you come up with the optimum model, there is someone out there who’ll up the ante and you’ll strive to achieve more.

How to start?

Choose your problem(s)

There are always multiple problems available for users to attempt. Each problem lasts for a few months. I’d suggest you jump in at the to attempt multiple problems that you find interesting enough.

Some people may find finance related problems more interesting while others may find physics or retail sales based problems more enthusing. There are also different types of data available. Some problem challenges you to analyze images while others present text or raw data.

Whatever it is that stimulates you, it’s better to try a few problems on that, read the discussions that are going on, and shortlist one or two problems that suits you best.

Find a team

The next challenge is to find team members who complement your skills. In Analytics Cosm we help you find teammates after going through your background and interests. Feel free to write to us.

Basic set up:

There are few basic things you need to perform with every data sets available in Kaggle. The below is a good checklist, but given the diverse nature of problems that Kaggle offers it’s not always comprehensive.

Domain research

We are in a process to come up with domain tutorials that’ll give you a good starting point for all predictive analytics problems (not just Kaggle). Meanwhile Kaggle offers a good intro about each problem that can get you going on a specific problem.

Submit a simple solution and check your score

Most of the times a sample solution template is provided with a problem. Create your own initial solutions and submit it. This will give you an initial direction.

Explore, explore and explore

In predictive analytics there is no end to how much you can explore. The more you do, the more you’ll learn. And don’t forget to post your experience in this forum, so that others can learn from it.

Some pointers to what you can explore

    Analyze the data with graphs to find the patterns

    Modify the data

    Try different features- combine or break down the variables

    Try different models

    Tune your models



About the Author

Leave a Comment:

Leave a Comment: