A hypothesis is a statement that we are trying to prove or disprove. It is used to express the relationship between variables and whether this relationship is significant. It is specific and offers a prediction on the results of your research question.
Your research question will lead you to developing a hypothesis, this is why your research question needs to be specific and clear.
The hypothesis will then guide you to the most appropriate techniques you should use to answer the question. They reflect the literature and theories on which you basing them. They need to be testable (i.e. measurable and practical).
Null hypothesis (H0) is the proposition that there will not be a relationship between the variables you are looking at. i.e. any differences are due to chance). They always refer to the population. (Usually we don't believe this to be true.)
e.g. There is no difference in instances of illegal drug use by teenagers who are members of a gang and those who are not..
Alternative hypothesis (HA) or (H1): this is sometimes called the research hypothesis or experimental hypothesis. It is the proposition that there will be a relationship. It is a statement of inequality between the variables you are interested in. They always refer to the sample. It is usually a declaration rather than a question and is clear, to the point and specific.
e.g. The instances of illegal drug use of teenagers who are members of a gang is different than the instances of illegal drug use of teenagers who are not gang members.
A non-directional research hypothesis - reflects an expected difference between groups but does not specify the direction of this difference (see two-tailed test).
A directional research hypothesis - reflects an expected difference between groups but does specify the direction of this difference. (see one-tailed test)
e.g. The instances of illegal drug use by teenagers who are members of a gang will be higher than the instances of illegal drug use of teenagers who are not gang members.
Then the process of testing is to ascertain which hypothesis to believe.
It is usually easier to prove something as untrue rather than true, so looking at the null hypothesis is the usual starting point.
The process of examining the null hypothesis in light of evidence from the sample is called significance testing. It is a way of establishing a range of values in which we can establish whether the null hypothesis is true or false.
There has been discussion over whether the scientific method employed in traditional hypothesis testing is appropriate.
See below for some articles that discuss this:
Taken from: Salkind, N.J. (2017) Statistics for people who (think they) hate statistics. 6th edn. London: SAGE pp. 144-145.
e.g. a significance level of 0.05 indicates that there is a 5% (or 1 in 20) risk of deciding that there is an effect when in fact there is none.
The lower the significance level that you set, then the evidence from the sample has to be stronger to be able to reject the null hypothesis.
N.B. - it is important that you set the significance level before you carry out your study and analysis.
It is possible to test the significance of your null hypothesis using Confidence Interval (see under samples and populations tab).
- if the range lies outside our predicted null hypothesis value we can reject it and accept the alternative hypothesis
This is another commonly used statistic
Type I error - this is the chance of wrongly rejecting the null hypothesis even though it is actually true, e.g. by using a 5% p level you would expect the null hypothesis to be rejected about 5% of the time when the null hypothesis is true. You could set a more stringent p level such as 1% (or 1 in 100) to be more certain of not seeing a Type I error. This, however, makes more likely another type of error (Type II) occurring.
Type II error - this is where there is an effect, but the p value you obtain is non-significant hence you don’t detect this effect.
One-tailed tests - where we know in which direction (e.g. larger or smaller) the difference between sample and population will be. It is a directional hypothesis.
Two-tailed tests - where we are looking at whether there is a difference between sample and population. This difference could be larger or smaller. This is a non-directional hypothesis.
If the difference is in the direction you have predicted (i.e. a one-tailed test) it is easier to get a significant result. Though there are arguments against using a one-tailed test (Wright and London, 2009, p. 98-99)*
*Wright, D. B. & London, K. (2009) First (and second) steps in statistics. 2nd edn. London: SAGE.
N.B. - think of the ‘tails’ as the regions at the far-end of a normal distribution. For a two-tailed test with significance level of 0.05% then 0.025% of the values would be at one end of the distribution and the other 0.025% would be at the other end of the distribution. It is the values in these ‘critical’ extreme regions where we can think about rejecting the null hypothesis and claim that there has been an effect.
Degrees of freedom (df) is a rather difficult mathematical concept, but is needed to calculate the signifcance of certain statistical tests, such as the t-test, ANOVA and Chi-squared test.
It is broadly defined as the number of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters. (Taken from Minitab Blog).
The higher the degrees of freedom are the more powerful and precise your estimates of the parameter (population) will be.
Typically, for a 1-sample t-test it is considered as the number of values in your sample minus 1.
For chi-squared tests with a table of rows and columns the rule is:
(Number of rows minus 1) times (number of columns minus 1)
Any accessible example to illustrate the principle of degrees of freedom using chocolates.