Category Archives for "Analytics Tools"

Sep 06

Refer to a Guide

Have a copy of ‘The Little SAS: Primer’ book ( You can purchase here:The Little SAS Book: A Primer, Fifth Edition: 9781612903439: Computer Science Books)

Tutorials on SAS website

SAS itself provides free tutorials (Covers almost BASE SAS syllabus) on SAS. You can learn it from there as well. Here is the link: SAS Tutorials | SAS Training. Please refer the ‘The Little SAS’ book along with those tutorials to get more insight of SAS.

Explore Other Tutorials

Here is complete Base SAS tutorials that might give you good knowledge about SAS:SAS Tutorial

If you can afford to spend a little more money then I’d say grab this online course: SAS Training | SAS Certification | Online Course – Simplilearn

Footnote: Either you can go for coaching or follow any other tutorials but don’t forget to have a copy of ‘The Little SAS: Primer’ book.

Note: If you are new to data science then please don’t hesitate to message me in case you have doubt or you want any help from you.

Inspired from Akash Dugam’s answer on Quora

Aug 07

Statistical Terms for Data Science Explained in Simple English

True Value– The actual population value that is obtained by counting or measuring. (like number of people in a town whose age is above 60)

Experiment or Trial– It is any procedure that can be infinitely repeated and has a well-defined set of possible outcomes, known as the sample space

Outcome– Outcome is a possible result of an experiment where each possible outcome of a particular experiment is unique, and different outcomes are mutually exclusive (only one outcome will occur on each trial of the experiment)

Sample space– Set of all possible outcomes or results of an experiment

Event– An event is a set of outcomes of an experiment (a subset of the sample space) to which a probability is assigned

Random experiment– An experiment is said to be random if it has more than one possible outcome

Deterministic trial– An experiment/trial is said to be deterministic if it has only one possible outcome.

Bernoulli trial– A random experiment that has exactly two (mutually exclusive) possible outcomes is known as a Bernoulli trial

Random variable/ Random quantity/ Aleatory variable/ Stochastic variable– It is a variable that can take on a set of possible different values (like mathematical variables), each with an associated probability, in contrast to other mathematical variables.

Discrete random variable– A random variable which can only take a countable number of values.

Continuous random variable– A random variable which can only take infinite number of values.

Probability Distribution– Mathematical description of a random phenomenon in terms of the probabilities of events

Population– A statistical population can be a group of actually existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker).

Sample It is a set of data collected and/or selected from a statistical population by a defined procedure.

Observations/ Sample points– The elements of a sample are known as sample points, sampling units or observations.

Mathematical Model– A mathematical equation or graph that is used to describe a real life situation

Statistical Model– A statistical model is usually specified by mathematical equations that relate one or more random variables

Discrete Probability Distribution

Binomial distribution It is the discrete probability distribution with parameters n and p of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

Bernoulli experiment or Bernoulli trial– It’s a success/failure experiment. In binomial distribution when n = 1, it’s a Bernoulli distribution.

Cumulative Distribution Function (CDF) Distribution of a real-valued random variable K, evaluated at x, is the probability that K will take a value less than or equal to x. Cumulative distribution function is a step function that jumps up by 1/n at each of the data points, where there are n data points.

Empirical distribution function– It estimates the cumulative distribution function underlying of the points in the sample and converges with probability 1 as shown in the below figure

Image Source: Wikipedia

Poisson distribution– It expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

Statistical inference– It is the process of deducing properties of an underlying distribution by analysis of data. Inferential statistical analysis infers properties about a population: this includes testing hypotheses and deriving estimates. The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be sampled from a larger population.

Frequentist inference– It is a type of statistical inference that draws conclusions from sample data by emphasizing the frequency or proportion of the data. This is the inference framework in which the well-established methodologies of statistical hypothesis testing and confidence intervals are based.

Bayesian inference– Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available.

Statistical hypothesis– Two statistical data sets are compared, or a data set obtained by sampling is compared against a synthetic data set from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two data sets. The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Hypothesis tests are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance.

Confidence interval– It is a type of interval estimate of a population parameter. It is an observed interval (i.e., it is calculated from the observations), in principle different from sample to sample, that frequently includes the value of an unobservable parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. More specifically, the meaning of the term “confidence level” is that, if CI are constructed across many separate data analyses of replicated (and possibly different) experiments, the proportion of such intervals that contain the true value of the parameter will match the given confidence level.

Jul 24

7 Analytics Tools and the Job Market

Photo: Bigstockphoto

Here is a list of 7 analytics tools and the number of jobs in Indeed that have listed the tool as a preferred/required skill for an analyst position. Many jobs list down multiple tools and people having expertise in one or more of these tools can apply. We’ve also included some text from the job description to give you an idea about the roles and how the knowledge of the tools will help you in fulfilling that expectation. The information available in this post is as of July, 2016. You may also visit the references available in this article to understand more about these jobs/tools.

SAS

Listed analyst jobs (Full-time and Contract): 5429
Snapshot of job description:

• Partners marketing professionals to understand their information needs and the challenges of the business.
• Integrates quantitative data from many diverse areas and data sources.
• Develops detailed specifications or enhancements for database tools that address business information priorities.
• Develops complex queries to extract data.
• Performs statistical analysis using techniques such as regression analysis, multivariate analysis, choice-based conjoint, Analysis of Variance (ANOVA), and data mining.
• Organizes and presents findings regarding market trends, competitive intelligence, customer preferences, and market penetration strategies; makes recommendations to address issues.
• Takes the lead to manage moderately complex research or analytic projects that involve multiple team members.

Matlab

Listed analyst jobs (Full-time and Contract): 1113
Snapshot of job description:

Apply statistical and machine driven techniques to uncover interesting digital trends that are relevant to the typical consumer, business and industry press, and digital marketers. This person will be responsible for mining data set and appending available data where possible to find insights that are deserving of wide publication.

R

Listed analyst jobs (Full-time and Contract): 4996
Snapshot of job description:

• Demonstrated experience with statistical forecasting, pattern recognition, and time series analysis a must; proficiency with statistics languages. Experience applying analytic skills to a real-world project, from the early stages of solution ideation, algorithm design, and prototyping to tasks like input data cleanup, model building, and process rollout and implementation. Ability to explain complex problems, solutions, or the essence of a quantitative model to a layperson in a simple, understandable way. Experience with mathematical programming and optimization

Python

Listed analyst jobs (Full-time and Contract): 5278
Snapshot of job description:

Responsible for growing revenue, optimizing our user experience, and gathering data that drives the direction of our business. You will collaborate with engineers to build new features and set up A/B tests. You will help data scientists apply complex models to our data.

Minitab

Listed analyst jobs (Full-time and Contract): 191
Snapshot of job description:
Knowledge of basic statistics required. Ability to identify and comprehend information and present in format that others, with little experience with the data, can understand required. Working knowledge of database management required.

Stata

Listed analyst jobs (Full-time and Contract): 610
Snapshot of job description:

Analyze complex data in connection with high stakes business litigation and general consulting. Consultants/Senior Consultants are expected to support senior staff by working with databases; composing written summaries of case and industry information; directing analysts in executing assignments; and preparing charts, tables, and graphs. This position provides an opportunity to enhance quantitative analytical skills through the use of numerous software tools