By
ganpati

Analytics Tools
True Value– The actual population value that is obtained by counting or measuring. (like number of people in a town whose age is above 60)
Experiment or Trial– It is any procedure that can be infinitely repeated and has a welldefined set of possible outcomes, known as the sample space
Outcome– Outcome is a possible result of an experiment where each possible outcome of a particular experiment is unique, and different outcomes are mutually exclusive (only one outcome will occur on each trial of the experiment)
Sample space– Set of all possible outcomes or results of an experiment
Event– An event is a set of outcomes of an experiment (a subset of the sample space) to which a probability is assigned
Random experiment– An experiment is said to be random if it has more than one possible outcome
Deterministic trial– An experiment/trial is said to be deterministic if it has only one possible outcome.
Bernoulli trial– A random experiment that has exactly two (mutually exclusive) possible outcomes is known as a Bernoulli trial
Random variable/ Random quantity/ Aleatory variable/ Stochastic variable– It is a variable that can take on a set of possible different values (like mathematical variables), each with an associated probability, in contrast to other mathematical variables.
Discrete random variable– A random variable which can only take a countable number of values.
Continuous random variable– A random variable which can only take infinite number of values.
Probability Distribution– Mathematical description of a random phenomenon in terms of the probabilities of events
Population– A statistical population can be a group of actually existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker).
Sample It is a set of data collected and/or selected from a statistical population by a defined procedure.
Observations/ Sample points– The elements of a sample are known as sample points, sampling units or observations.
Mathematical Model– A mathematical equation or graph that is used to describe a real life situation
Statistical Model– A statistical model is usually specified by mathematical equations that relate one or more random variables
Discrete Probability Distribution
Binomial distribution It is the discrete probability distribution with parameters n and p of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
Bernoulli experiment or Bernoulli trial– It’s a success/failure experiment. In binomial distribution when n = 1, it’s a Bernoulli distribution.
Cumulative Distribution Function (CDF) Distribution of a realvalued random variable K, evaluated at x, is the probability that K will take a value less than or equal to x. Cumulative distribution function is a step function that jumps up by 1/n at each of the data points, where there are n data points.
Empirical distribution function– It estimates the cumulative distribution function underlying of the points in the sample and converges with probability 1 as shown in the below figure
Image Source: Wikipedia
Poisson distribution– It expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
Statistical inference– It is the process of deducing properties of an underlying distribution by analysis of data. Inferential statistical analysis infers properties about a population: this includes testing hypotheses and deriving estimates. The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be sampled from a larger population.
Frequentist inference– It is a type of statistical inference that draws conclusions from sample data by emphasizing the frequency or proportion of the data. This is the inference framework in which the wellestablished methodologies of statistical hypothesis testing and confidence intervals are based.
Bayesian inference– Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available.
Statistical hypothesis– Two statistical data sets are compared, or a data set obtained by sampling is compared against a synthetic data set from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis that proposes no relationship between two data sets. The comparison is deemed statistically significant if the relationship between the data sets would be an unlikely realization of the null hypothesis according to a threshold probability—the significance level. Hypothesis tests are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a prespecified level of significance.
Confidence interval– It is a type of interval estimate of a population parameter. It is an observed interval (i.e., it is calculated from the observations), in principle different from sample to sample, that frequently includes the value of an unobservable parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level or confidence coefficient. More specifically, the meaning of the term “confidence level” is that, if CI are constructed across many separate data analyses of replicated (and possibly different) experiments, the proportion of such intervals that contain the true value of the parameter will match the given confidence level.