[ About | Licence | Contacts ]
Written by Oleksandr Gavenko (AKA gavenkoa), compiled on 2023-03-19 from rev c18d218b854e.

Statistics

Markov inequality

Markov inequality: P(X ≥ a) ≤ E[P] ⁄ a for all a > 0.

Chebyshev inequality

Chebyshev inequality: if X is a random variable with mean μ and variance σ² then

P(|X − μ| ≥ c) ≤ σ² ⁄ c²

for all c > 0.

Central limit theorem

Central limit theorem: let X1, ..., Xn, ... be a sequence of independent identically distributed random variables with common mean μ and variance σ² and let:

Zn = ((1 ≤ i ≤ nXi) − n·μ) ⁄ (σ·sqrt(n))

Then CDF of Zn converge to standard normal CDF:

Φ(z) = 1 ⁄ (2·π( − ∞;z]exp( − x² ⁄ 2)dx
limn → ∞P(Zn ≤ z) = Φ(z)

Null hypothesis

Null hypothesis a statement that the phenomenon being studied produces no effect or makes no difference, assumption that effect actually due to chance.

p-value

p-value is the probability of the apparent effect under the null hypothesis.

https://en.wikipedia.org/wiki/P-value
Wikipedia page

Significance level

If the p-value is less than or equal to the chosen significance level (α), the test suggests that the observed data are inconsistent with the null hypothesis, so the null hypothesis should be rejected.

Hypothesis testing

Hypothesis testing is process of interpretation of statistical significance of given null hypothesis based on observed p-value from sample with choosen significance level.

After finishing hypothesis testing we should reject null hypothesis or fail to reject due to lack of enough evidence or ...

Hypothesis testing only take into account:

But it doesn't cover cases:

Asymptotic approximation

CLT say that sample mean distribution is approximated by normal distribution.

With fair enough number of samples approximation is quite good.

So during hypothesis testing usually researcher makes assumption that is is safe to replace unknown distribution of means for independent and identicaly distributed individual samples with approximation.

For really small number of samples Student distribution is used instead of normal distribution. But again it means that researcher made assumption and you may not agree with it, so it is your right to reject any subsequent decision based on "wrong" assumption.

Type I error

Type I error is the incorrect rejection of a true null hypothesis (a false positive).

Type I error rate is at most α (significant level).

The p-value of a test is the maximum false positive risk you would take by rejecting the null hypothesis.

Type II error

Type II error is failing to reject a false null hypothesis (a false negative).

Probability of type II error usually called β.

Power

Power is a probability to reject null hypothesis when it's false. So power probability is 1 − β.

Confidence interval

Confidence interval

https://en.wikipedia.org/wiki/Confidence_interval
Wikipedia page

Question

What to do with null hypothesis in classical inference?

I successfully shirked stat classes 10 years ago (last night reading help me actually to pass exam) and now when I take several Coursera stat classes I have difficulties with understanding null hypothesis. Somehow with unclear intuition I passed quizzes but want to understand subject.

Suppose we have population and sample some data from population. Reasonable question: is some property of sample make evidence to be true on population?

Statistic is a real number that can be derived from population or sample. Classical example is a mean value.

We ask is it statistically significant that statistic of population is near to statistic of sample.