As more and more new, quality materials are published for better understanding of big data and machine learning concepts, finding such latest materials is becoming increasingly challenging. And, since it’s the most competitive field, staying updated on the changes is important. Things you were told 2 years ago may not matter at all today.

I know many professionals who prefer to spend an hour or two in Google or Reddit to discover the best materials getting published. Fortunately, that really isn’t necessary if you have an indexed, properly curated and constantly updated source that gets constant feedback from its readers.

In this regularly updated post, I’ll give you resources you can use to learn about Big Data and machine learning and stay on top of the job market, including some free tools that’ll be useful for getting the job done. Enjoy

## 1000+ Most Popular Resources on Big Data/ML/Data Science and Visualization Across the Web

Help us stay updated with best big data and ML resources by sharing the best with the world

Get weekly updates about popular articles on these topics by email

## About Data Science

Doing Data Science at Twitter It talks about how machine learning has played an increasingly prominent role across many core Twitter products that were previously not ML driven and how the data science landscape in Twitter has changed in the recent past

__Data Science Salary Survey 2015__ the 2015 version of the Data Science Salary Survey explores patterns in tools, tasks, and compensation through the lens of clustering and linear models. The research is based on data collected through an online 32-question survey, including demographic information

__Some Real World Machine Learning Examples__ The post talks about what are some real-world examples of applications of machine learning in the field- ranging from Computational Biology & Drug Discovery/Design to web Search and recommendation engines, finance etc.

## Programming for Data Science

__25 Java Machine Learning Tools & Libraries__ Lists 25 Java Machine learning tools & libraries like Weka, Meka, ADAMS, Mallet, Encog etc.

__R vs Python__ In the battle of “best” data science tools, python and R both have their pros and cons. Selecting one over the other will depend on the use-cases, the cost of learning, and other common tools required.Here’s an analysis.

__How to Learn R__ R-bloggers and DataCamp have worked together to bring you a learning path for R. Each section points you to relevant resources and tools to get you started and keep you engaged to continue learning. It’s a mix of materials ranging from documentation, online courses, books, and more.

__A two-hour introduction to data analysis in R__ If you’re looking for a non-diamonds or non-nycflights13 introduction to R / ggplot2 / dplyr feel free to use materials from this workshop.

__Intro to Python for Data Science__ Unlike other Python tutorials, this course focuses on Python specifically for data science. In this Intro to Python class, you will learn about powerful ways to store and manipulate data as well as cool data science tools to start your own analyses.

__Introduction to machine learning in Python with scikit-learn (video series)__ Scikit-learn is Python’s library for machine learning. Here’s a series of nine video tutorials totaling four hours in partnership with Kaggle.

__Intro to Julia__– Julia aims to address the “two language problem” that is all too common in technical computing. Visit this post for a fresh approach to numerical computing and data science using Julia.

__Cheat sheets on various data science tools__ Here’s a good starting point. You can find many additional references here (Python, Excel, Spark, R, Deep Learning, AI, SQL, NoSQL, Graph Databses, Visualization, etc.)

__Top 10 R Packages to be a Kaggle Champion__ Across all major surveys, R has clearly dominated as one of the top programming choices for data scientists. Thus, it is no wonder that knowing the important R packages can be a vital advantage in Kaggle competitions. Here’s a list of 10 R packages that played a key role in getting a top 10 ranking in more than 15 Kaggle competitions

__Integrating Python and R into a Data Analysis Pipeline__ The first in a series of blog posts that: outline the basic strategy for integrating Python and R, run through the different steps involved in this process; and give a real example of how and why you would want to do this.

## Machine Learning

Steps in Machine Learning

__Data Exploration with Python__– Here is a cheat sheet to help you with various codes and steps while performing exploratory data analysis in Python. There is also a pdf version of the sheet o that you can easily copy / paste these codes.

Data Exploration with SAS Exploring data sets and developing deep understanding about the data is one of the most important skill every data scientist should possess. People estimate that time spent on these activities can go as high as 80% of the project time in some cases. this guide, This guide uses NumPy, Matplotlib, Seaborn and Pandas to perform data exploration.

__Data Exploration with R__– Detailed tutorial on Data Exploration using R

__Types of algorithms__– This post gives you two ways to think about and categorize the algorithms you may come across in the field. The first is a grouping of algorithms by the learning style.The second is a grouping of algorithms by similarity in form or function (like grouping similar animals together).

Arriving at an Algorithm

List of Statistical Data Mining Tutorials by Andrew Moore

## Model Selection

Performance Estimation: Generalization Performance Vs. Model Selection

Predictive model selection – quick tricks

Frequentism and Bayesianism V: Model Selection

Survival of Fitness: How Model Selection Happens In The Natural Order of Data Science

Machine learning for model selection in population genomics

## Feature Engineering

The Data Science Machine, or ‘How To Engineer Feature Engineering’

Feature Engineering versus Feature Extraction: Game On!

Feature Engineering for Fraud Detection Models

## Boosting, Bagging and Stacking

How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

Quick Introduction to Boosting Algorithms in Machine Learning

Learn Gradient Boosting Algorithm for better predictions (with codes in R)

What’s the similarities and differences beetween this 3 methods: bagging, boosting, stacking?

Model ensembling for Kaggle

## Evaluating Machine Learning Models

How to Evaluate Machine Learning Models: Classification Metrics

## Overfitting

Overfitting or generalized? Comparison of ML classifiers – a series of articles

Data Science 101: Preventing Overfitting in Neural Networks

Decision Trees – Handling Overfitting using Forests

## Dealing With Unstructured Data

Unlocking The Value Of Unstructured Data

5 Easy Steps to Structure Highly Unstructured Big Data, via Automated Indexation

The Applications of Machine Learning Through Unstructured Text Data

## Recommender System

How to Build a Recommender System

Building the Next New York Times Recommendation Engine

Collaborative filtering recommendation engine implementation in python

Basic recommendation engine using R

Building a Real-Time Geospatial-Aware Recommendation Engine

How to build your own recommendation engine using machine learning on Google Compute Engine

Apache Mahout The Recommender System for Big Data

Recommender System with Mahout and Elasticsearch

The Netflix Recommender System: Algorithms, Business Value, and Innovation

Building a Recommendation Engine with Spark ML on Amazon EMR using Zeppelin

## Text Mining

Overview of Text Mining

10 Common NLP Terms Explained for the Text Mining Novice

Hacks to perform faster Text Mining in R

Text Mining Analysis: some theory and practice in R

Text Mining Shakespeare with MATLAB

Use case: Text analytics vs survey analysis

## Deep Learning

Microsoft Neural Net Shows Deep Learning Can Get Way Deeper

Deep learning – Convolutional neural networks and feature extraction with Python

__Google’s New AI System Could Be ‘Machine Learning’ Breakthrough__ TensorFlow is the first serious implementation of a framework for ‘deep learning,’ backed by both very experienced and very capable team at Google. Here’s an article that introduces you to TensorFlow.

## Other Topics

Applied Spatial Data Science

Distributed Data-structures

## Popular Books

10 Big Data Books To Boost Your Career – InformationWeek

Quick Reviews: 3 Books on Visualizing Data

15 Must Read Books for Entrepreneurs in Data Science

Data at work: a data visualization book for Excel users

60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more

16 Free Data Science Books

Free Must Read Books on Statistics & Mathematics for Data Science

15 Books every Data Scientist Should Read

## Lessons from Kaggle

Anthony Goldbloom gives you the secret to winning Kaggle competitions

Learning from the OTTO Group Kaggle competition

Doing Data Science: A Kaggle Walkthrough – Cleaning Data

Understanding Text Mining Using Kaggle

## Visualization

7 tools for data visualization in R, Python and Julia

## Databases

SQL VS. NOSQL- What You Need to Know

## Statistics

Becoming a Full-Stack Statistician in 6 Easy Steps

Advice for applying Machine Learning by Andrew Ng