This post is based on Yatani’s wiki on HCI Stats.
1. What is NHST?
The basic idea of NHST is to draw a result by proving its opposite is false. For example, in comparing two text entry input methods we have a null hypothesis that there is no relationship between the response time and the choice of the methods. Meanwhile, the alternate hypothesis is that there is a particular relationship between these two variables.
2. What are some NHST methods?
T test, ANOVA
3. What is the meaning of p in NHST?
The p in NHST shows, given the null hypothesis, the likelihood that your experiment would yield the results you have obtained. Simply put, if p = 0.01, there is a 1% likelihood that the participants’ response time would look like the results you have now.
4. How do we decide the ‘threshold’ of p?
There is no stringent thresholds (the commonly-used 0.05 and 0.10 are rather arbitrary). Nor is there a rigid mapping between the difference of p values and the difference of the underlying significances they represent.
5. What is the relationship between sample size and the NHST result?
Yatani’s example shows that NHST is restricted to sample size: the small difference becomes more significant as the sample size increases.
6. How is ‘significance’ measured in NHST?
Not by the value of p. As the definition tells, NHST only tells an either-or result. We cannot, for example, relate a smaller p to a ‘more significant’ assertion.
7. Is ‘significance’ the only important metric?
No. Yatani shows an example: a small difference might carry a great significance – but it’s still a small difference; meanwhile, a great difference might not show sufficient significance – should we still accept it as a contribution?
8. Overall, how do we use NHST?
Yatani lists four ‘should’s for you to really make good use of NHST:
- One’s research question should be answered by ‘yes’ or ‘no’;
- One should have an appropriate null hypothesis;
- One should interpret p correctly;
- One should use the term ‘significant’ properly.
9. What is effect size?
Effect size reflects the magnitude of the effect caused by a factor. It’s not dependent on sample size. Yatani thinks one should also report effect size together with the other results.