Statistics

Markov inequality
Chebyshev inequality
Central limit theorem
Null hypothesis
p-value
Significance level
Hypothesis testing
Asymptotic approximation
Type I error
Type II error
Power
Confidence interval
Question

Markov inequality

Markov inequality: P(X ≥ a) ≤ E[P] ⁄ a for all a > 0.

Chebyshev inequality

Chebyshev inequality: if X is a random variable with mean μ and variance σ² then

P(|X − μ| ≥ c) ≤ σ² ⁄ c²

for all c > 0.

Central limit theorem

Central limit theorem: let X₁, ..., X_n, ... be a sequence of independent identically distributed random variables with common mean μ and variance σ² and let:

Z_n = ((∑_{1 ≤ i ≤ n}X_i) − n·μ) ⁄ (σ·sqrt(n))

Then CDF of Z_n converge to standard normal CDF:

Φ(z) = 1 ⁄ (2·π)·∫_{( − ∞;z]}exp( − x² ⁄ 2)dx

lim_{n → ∞}P(Z_n ≤ z) = Φ(z)

Null hypothesis

Null hypothesis a statement that the phenomenon being studied produces no effect or makes no difference, assumption that effect actually due to chance.

p-value

p-value is the probability of the apparent effect under the null hypothesis.

https://en.wikipedia.org/wiki/P-value

Wikipedia page

If the p-value is less than or equal to the chosen significance level (α), the test suggests that the observed data are inconsistent with the null hypothesis, so the null hypothesis should be rejected.

Hypothesis testing

Hypothesis testing is process of interpretation of statistical significance of given null hypothesis based on observed p-value from sample with choosen significance level.

After finishing hypothesis testing we should reject null hypothesis or fail to reject due to lack of enough evidence or ...

Hypothesis testing only take into account:

that effect might be due to chance; that is, the difference might appear in a random sample, but not in the general population

But it doesn't cover cases:

The effect might be real; that is, a similar difference would be seen in the general population.
The apparent effect might be due to a biased sampling process, so it would not appear in the general population.
The apparent effect might be due to measurement errors.

Asymptotic approximation

CLT say that sample mean distribution is approximated by normal distribution.

With fair enough number of samples approximation is quite good.

So during hypothesis testing usually researcher makes assumption that is is safe to replace unknown distribution of means for independent and identicaly distributed individual samples with approximation.

For really small number of samples Student distribution is used instead of normal distribution. But again it means that researcher made assumption and you may not agree with it, so it is your right to reject any subsequent decision based on "wrong" assumption.

Type I error

Type I error is the incorrect rejection of a true null hypothesis (a false positive).

Type I error rate is at most α (significant level).

The p-value of a test is the maximum false positive risk you would take by rejecting the null hypothesis.

Type II error

Type II error is failing to reject a false null hypothesis (a false negative).

Probability of type II error usually called β.

Power

Power is a probability to reject null hypothesis when it's false. So power probability is 1 − β.

Confidence interval

https://en.wikipedia.org/wiki/Confidence_interval

Wikipedia page

Question

What to do with null hypothesis in classical inference?

I successfully shirked stat classes 10 years ago (last night reading help me actually to pass exam) and now when I take several Coursera stat classes I have difficulties with understanding null hypothesis. Somehow with unclear intuition I passed quizzes but want to understand subject.

Suppose we have population and sample some data from population. Reasonable question: is some property of sample make evidence to be true on population?

Statistic is a real number that can be derived from population or sample. Classical example is a mean value.

We ask is it statistically significant that statistic of population is near to statistic of sample.