
{"id":222,"date":"2021-12-15T21:16:05","date_gmt":"2021-12-15T21:16:05","guid":{"rendered":"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-7\/"},"modified":"2025-08-28T23:48:32","modified_gmt":"2025-08-28T23:48:32","slug":"chapter-7","status":"publish","type":"chapter","link":"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-7\/","title":{"raw":"Chapter 7: Introduction to Hypothesis Testing","rendered":"Chapter 7: Introduction to Hypothesis Testing"},"content":{"raw":"<div class=\"textbox textbox--sidebar textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h3 class=\"Chapter-element-head\">Key Terms<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor014\"><span class=\"Hyperlink-underscore\">alternative hypothesis<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor171\"><span class=\"Hyperlink-underscore\">critical value<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor179\"><span class=\"Hyperlink-underscore\">effect size<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor164\"><span class=\"Hyperlink-underscore\">hypothesis<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor167\"><span class=\"Hyperlink-underscore\">null hypothesis<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor166\"><span class=\"Hyperlink-underscore\">probability value<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor174\"><span class=\"Hyperlink-underscore CharOverride-12\">p<\/span><span class=\"Hyperlink-underscore\"> value<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor169\"><span class=\"Hyperlink-underscore\">rejection region<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor168\"><span class=\"Hyperlink-underscore\">significance level<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor184\"><span class=\"Hyperlink-underscore\">statistical power<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor176\"><span class=\"Hyperlink-underscore\">statistical significance<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor173\"><span class=\"Hyperlink-underscore\">test statistic<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor182\"><span class=\"Hyperlink-underscore\">Type I error<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor183\"><span class=\"Hyperlink-underscore\">Type II error<\/span><\/a><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p class=\"Text-1st\">This chapter lays out the basic logic and process of hypothesis testing. We will perform <span class=\"italic\">z<\/span>\u00a0tests, which use the <span class=\"italic\">z<\/span>\u00a0score formula from <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-6\/\"><span class=\"Hyperlink-underscore\">Chapter 6<\/span><\/a> and data from a sample mean to make an inference about a population.<\/p>\r\n<strong data-start=\"215\" data-end=\"256\">Social Justice and Hypothesis Testing<\/strong><br data-start=\"256\" data-end=\"259\" \/>Hypothesis testing allows us to move from description to action \u2014 it is the process that helps us decide whether patterns we see in data are likely due to chance or reflect real differences in the world. In social justice research, this means testing questions like: <em data-start=\"526\" data-end=\"653\">Are students of color disciplined more often than white students? Do women earn less than men, even when doing the same work?<\/em> By setting up null and alternative hypotheses and using probability, we can examine whether inequalities are simply random variation or evidence of systemic bias.\r\n<p class=\"Text\">We have learned to calculate means, medians and modes as well as variance and standard deviations to describe data.\u00a0 Now we are moving on to make predictions and inferences about data.\u00a0 This involves developing a null and research hypothesis and using probabilities to determine if we can predict an outcome.\u00a0 For example, if we want to know the likelihood that that a person of color will be pulled over by the police compared to a White person, we can develop a research and null hypothesis to test that prediction.\u00a0 To do this we need to collect data on the amount of times a person of color is stopped by police as well as the number of times a White person is stopped.\u00a0 We might predict that it is much more likely to be pulled over if you are not White.\u00a0 We can set up hypotheses to test that prediction.<\/p>\r\n\r\n<h3 class=\"H1\">The Null Hypothesis<\/h3>\r\n<p class=\"Text-1st\">The hypothesis that an apparent effect is due to chance is called the [pb_glossary id=\"666\"]<a id=\"_idTextAnchor167\"><\/a>[\/pb_glossary]<span class=\"key-term\">null hypothesis<\/span>, written <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span> (\u201c<span class=\"italic\">H<\/span>-naught\u201d). In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-84\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn7.1-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">\u00a0Another simpler way to state this null hypothesis would be:<\/p>\r\n<em><strong>Ho: obese time = average time\u00a0<\/strong><\/em>\r\n\r\nEssentially, both Ho's are saying there is no difference between the time spent with obese patients and average weight patients.\r\n<p class=\"Text\">Keep in mind that the null hypothesis is typically the opposite of the researcher\u2019s hypothesis. In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> study, the researchers hypothesized that physicians would expect to spend less time with obese patients. The null hypothesis that the two types of patients are treated identically is put forward with the hope that it can be discredited and therefore rejected. If the null hypothesis were true, a difference as large as or larger than the sample difference of 6.7 minutes would be very unlikely to occur. Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients.<\/p>\r\n<strong data-start=\"896\" data-end=\"908\">Example:<\/strong><br data-start=\"908\" data-end=\"911\" \/>Suppose we want to test whether women faculty are promoted at the same rate as men. The null hypothesis (H\u2080) would state there is no difference in promotion rates between men and women. This becomes our baseline assumption until we see strong evidence otherwise. The research hypothesis (H\u2090) would predict a difference \u2014 for example, that women are promoted less often.\r\n<p class=\"Text\">In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relationship between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups.\u00a0 However, until we have evidence against it, we must use the null hypothesis as our starting point.<\/p>\r\n\r\n<h3 class=\"H1\">The Alternative\u00a0 (also called the Research) Hypothesis<\/h3>\r\n<p class=\"Text-1st\">If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span> or <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">1 or Hr<\/span>. The <span class=\"key-term\">alternative hypothesis<\/span> is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-87\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.4-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">based on the research question itself. We should only use a directional hypothesis if we have good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-88\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.5-2.png\" alt=\"\" \/><\/p>\r\nSocial Justice Example\r\n\r\nIf we are investigating police stops, the null might be that Black and white drivers are stopped at the same rate. The research hypothesis would be that Black drivers are stopped more frequently. Whether we make this directional (greater than) or non-directional (simply \u201cdifferent\u201d) depends on prior evidence and theory.\r\n<p class=\"Text\">We will set different criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative. To understand why, we need to see where our criteria come from and how they relate to <span class=\"italic\">z<\/span>\u00a0scores and distributions.<\/p>\r\n\r\n<h3 class=\"H1\">Critical Values, <span class=\"bold-italic CharOverride-4\">p<\/span> Values, and Significance Level<\/h3>\r\n<p class=\"Text-1st\">A low probability value casts doubt on the null hypothesis. How low must the probability value be in order to conclude that the null hypothesis is false? Although there is clearly no right or wrong answer to this question, it is conventional to conclude the null hypothesis is false if the probability value is less than .05. More conservative researchers conclude the null hypothesis is false only if the probability value is less than .01. When a researcher concludes that the null hypothesis is false, the researcher is said to have rejected the null hypothesis. The probability value below which the null hypothesis is rejected is called the <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level or simply <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> (\u201calpha\u201d). It is also called the [pb_glossary id=\"670\"]<a id=\"_idTextAnchor168\"><\/a>[\/pb_glossary]<span class=\"key-term\">significance level<\/span>. If <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is not explicitly specified, assume that <span class=\"Symbol\">a<\/span> = .05.<\/p>\r\n<p class=\"Text\">The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values where the significance level, a=.05 or less , then we have sufficient evidence to reject the null hypothesis; if not and the probability or a is above .05, we fail to reject the null (we never \u201caccept\u201d the null).<\/p>\r\n<strong data-start=\"1919\" data-end=\"1931\">Example:<\/strong><br data-start=\"1931\" data-end=\"1934\" \/>Consider wage gaps. If our sample shows women earning less than men, we need to know: is this gap large enough that it\u2019s unlikely to have occurred by chance? If the probability of observing such a gap under the null is very small (say, p &lt; .01), we reject the null and conclude the wage gap is statistically significant. But we must also remember: \u201csignificant\u201d in statistics means the effect is real, not necessarily that it is large or practically meaningful. A very small pay gap could still reach significance in a huge dataset \u2014 but the justice implications may be limited.\r\n<p class=\"Text\">There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set\u00a0of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effect. The value laid out in <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span> is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true. Now recall that values of <span class=\"italic\">z <\/span>which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as extreme as <span class=\"italic\">z<\/span>\u2014or more extreme than <span class=\"italic\">z<\/span>\u2014is very small as we get into the tails of the distribution. Our significance level corresponds to the area in the tail that is exactly equal to <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. If\u00a0we use our normal criterion of <span class=\"Symbol\">a<\/span> = .05, then 5% of the area under the curve becomes what we call\u00a0the [pb_glossary id=\"669\"]<a id=\"_idTextAnchor169\"><\/a>[\/pb_glossary]<span class=\"key-term\">rejection region<\/span> (also called the critical region) of the distribution. This is illustrated in <a href=\"#_idTextAnchor170\"><span class=\"Fig-table-number-underscore\">Figure 7.1<\/span><\/a>. The shaded rejection region takes us 5% of the area under the curve. Any result that falls in that region is sufficient evidence to reject the null hypothesis.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer272\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor170\"><\/a>Figure 7.1.<\/span> The rejection region for a one-tailed test. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/65\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region for One-Tailed Test<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer273\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_for_One-Tailed_Test-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">The rejection region is bounded by a specific <span class=\"italic\">z<\/span>\u00a0value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the [pb_glossary id=\"664\"]<a id=\"_idTextAnchor171\"><\/a>[\/pb_glossary]<span class=\"key-term\">critical value<\/span>, <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>\u00a0(\u201c<span class=\"italic\">z<\/span>\u00a0crit\u201d), or <span class=\"italic\">z<\/span>* (hence the other name \u201ccritical region\u201d). Finding the critical value works exactly the same as finding the <span class=\"italic\">z<\/span>\u00a0score corresponding to any area under the curve as we did in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-1-fundamentals-of-statistics\/\"><span class=\"Hyperlink-underscore\">Unit 1<\/span><\/a>. If we go to the normal table, we will find that the <span class=\"italic\">z<\/span>\u00a0score corresponding to 5% of the area under the curve is equal to 1.645 (<span class=\"italic\">z <\/span>= 1.64 corresponds to .0505 and <span class=\"italic\">z <\/span>= 1.65 corresponds to .0495, so .05 is exactly in between them) if we go to the right and \u22121.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing and shading the distribution is helpful for keeping directionality straight.<\/p>\r\n<p class=\"Text\">Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don\u2019t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail\u2019s rejection region. For <span class=\"Symbol\">a<\/span> = .05, this means 2.5% of the area is in each tail, which, based on the <span class=\"italic\">z<\/span>\u00a0table, corresponds to critical values of <span class=\"italic\">z<\/span>* = \u00b11.96. This is shown in <a href=\"#_idTextAnchor172\"><span class=\"Fig-table-number-underscore\">Figure 7.2<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer274\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor172\"><\/a>Figure 7.2.<\/span> Two-tailed rejection region. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/66\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region for Two-Tailed Test<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer275\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_for_Two-Tailed_Test-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">Thus, any <span class=\"italic\">z<\/span>\u00a0score falling outside \u00b11.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use <span class=\"italic\">z<\/span>\u00a0scores in this way, the obtained value of <span class=\"italic\">z <\/span>(sometimes called <span class=\"italic\">z<\/span>\u00a0obtained and abbreviated <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>) is something known as a [pb_glossary id=\"673\"]<a id=\"_idTextAnchor173\"><\/a>[\/pb_glossary]<span class=\"key-term\">test statistic<\/span>, which is simply an inferential statistic used to test a null hypothesis. The formula for our <span class=\"italic\">z<\/span>\u00a0statistic has not changed:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-90\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.6-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">To formally test our hypothesis, we compare our obtained <span class=\"italic\">z<\/span>\u00a0statistic to our critical <span class=\"italic\">z<\/span>\u00a0value. If <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>\u00a0&gt;\u00a0<span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>, that means it falls in the rejection region (to see why, draw a line for <span class=\"italic\">z <\/span>= 2.5 on <a href=\"#_idTextAnchor170\"><span class=\"Fig-table-number-underscore\">Figure 7.1<\/span><\/a> or <a href=\"#_idTextAnchor172\"><span class=\"Fig-table-number-underscore\">Figure 7.<\/span><span class=\"Fig-table-number-underscore\">2<\/span><\/a>) and so we reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>. If <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span> &lt; <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>, we fail to reject. Remember that as <span class=\"italic\">z <\/span>gets larger, the corresponding area under the curve beyond <span class=\"italic\">z <\/span>gets smaller. Thus, the proportion, or <span class=\"italic\">p<\/span> value, will be smaller than the area for <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.<\/p>\r\n<p class=\"Text\">The <span class=\"italic\">z<\/span>\u00a0statistic is very useful when we are doing our calculations by hand. However, when we use computer software, it will report to us a [pb_glossary id=\"668\"]<a id=\"_idTextAnchor174\"><\/a>[\/pb_glossary]<span class=\"key-term CharOverride-2\">p<\/span><span class=\"key-term\"> value<\/span>, which is simply the proportion of the area under the curve in the tails beyond our obtained <span class=\"italic\">z<\/span>\u00a0statistic. We can directly compare this <span class=\"italic\">p<\/span> value to <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> to test our null hypothesis: if <span class=\"italic\">p<\/span> &lt; <span class=\"Symbol\">a<\/span>, we reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>, but if <span class=\"italic\">p<\/span> &gt; <span class=\"Symbol\">a<\/span>, we fail to reject. Note also that the reverse is always true. If we use critical values to test our hypothesis, we will always know if <span class=\"italic\">p<\/span> is greater than or less than <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. If we reject, we know that <span class=\"italic\">p<\/span> &lt; <span class=\"Symbol\">a<\/span> because the obtained <span class=\"italic\">z<\/span>\u00a0statistic falls farther out into the tail than the critical <span class=\"italic\">z<\/span>\u00a0value that corresponds to <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, so the proportion (<span class=\"italic\">p<\/span> value) for that <span class=\"italic\">z<\/span>\u00a0statistic will be smaller. Conversely, if we fail to reject, we know that the proportion will be larger than <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> because the <span class=\"italic\">z<\/span>\u00a0statistic will not be as far into the tail. This is illustrated for a one-tailed test in <a href=\"#_idTextAnchor175\"><span class=\"Fig-table-number-underscore\">Figure 7.3<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer282\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor175\"><\/a>Figure 7.3.<\/span> Relationship between <span class=\"Symbol\">a<\/span>, <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>, and <span class=\"italic\">p<\/span>. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/67\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between alpha, z-obt, and p<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer283\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_alpha_z-obt_and_p-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">When the null hypothesis is rejected, the effect is said to have [pb_glossary id=\"672\"]<a id=\"_idTextAnchor176\"><\/a>[\/pb_glossary]<span class=\"key-term\">statistical significance<\/span>, or be statistically significant. For example, in the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> case study, the probability value is .0057. Therefore, the effect of obesity is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is important to keep in mind that statistical significance means only that the null hypothesis of exactly no effect is rejected; it does not mean that the effect is important, which is what \u201csignificant\u201d usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is significant does not tell you about how large or important the effect is.<\/p>\r\n<p class=\"Text\"><span class=\"italic\">Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.<\/span><\/p>\r\n<p class=\"Text\">Why does the word \u201csignificant\u201d in the phrase \u201cstatistically significant\u201d mean something so different from other uses of the word? Interestingly, this is because the meaning of \u201csignificant\u201d in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was \u201csignificant\u201d if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of \u201csignificant\u201d changed, leading to the potential misinterpretation.<\/p>\r\n\r\n<h3 class=\"H1 ParaOverride-19\">The Hypothesis Testing Process<\/h3>\r\n<h4 class=\"H2\">A Four-Step Procedure<\/h4>\r\n<p class=\"Text-1st\">The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook and course, and although the hypothesis and statistics we use will change, this process will not.<\/p>\r\n\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 1:<\/span> State the Hypotheses<\/h5>\r\n<p class=\"Text-1st\">Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above <span class=\"italic\">and<\/span> in words, explaining in normal English what each one means in terms of the research question.<\/p>\r\nThe obesity example null hypothesis in English would be\r\n\r\n<strong>Ho:There is no difference the time spent with obesity patients compared to time spent with average patients.<\/strong>\r\n\r\nThe alternative hypothesis in English would be:\r\n\r\n<strong>Ha: There is a difference between the time spent with obese patients compared with average patients.<\/strong>\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 2:<\/span> Find the Critical Values<\/h5>\r\n<p class=\"Text-1st\">Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.<\/p>\r\n\r\n<h5 class=\"H3-step\"><span class=\"Step-- CharOverride-16\">Step 3:<\/span> Calculate the Test Statistic<\/h5>\r\n<p class=\"Text-1st\">Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic\u2014in this case <span class=\"italic\">z<\/span>. This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.<\/p>\r\n\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 4:<\/span> Make the Decision<\/h5>\r\n<p class=\"Text-1st\">Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.<\/p>\r\n<p class=\"Example-New\"><span class=\"Example--\">Example Interview Callbacks<\/span><\/p>\r\n<p class=\"Text-1st\">Let\u2019s see how hypothesis testing works in action by working through an example. Here is an example comparing a sample mean to a population mean. Say you want to look at the number of interview callbacks for recent college graduates based on ethnicity.\u00a0 The known population mean for interview callbacks is <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 8.00 and the known population standard deviation is <span class=\"Symbol\">s<\/span> = 0.50.\u00a0 You survey 25 college graduates who are Black and find that, on average, they receive 7.75 callbacks.\u00a0 This scenario has all of the information we need to begin our hypothesis testing procedure.<\/p>\r\n\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 1:<\/span> State the Hypotheses<\/h5>\r\n<p class=\"Text-1st\">We will need both a null and an alternative hypothesis written both mathematically and in words. We\u2019ll always start with the null hypothesis:<\/p>\r\nEXAMPLE:\r\n\r\nH <sub>O<\/sub>:\u00a0 There is no difference in the number of callbacks between the sample of Black graduates and the general population of graduates.\r\n<p class=\"Text\">Our assumption of no difference, the null hypothesis, is that this mean is exactly the same as the known population mean value we want it to match, 8.00. Now let\u2019s do the alternative:<\/p>\r\n<p class=\"Equation\">H<sub>A<\/sub>: There is a difference in the number of callbacks between the sample of Black graduates and the general population of graduates.<\/p>\r\n<p class=\"Text\">In this case, we don\u2019t know if the number of callbacks is more or less, so we do a two-tailed alternative hypothesis that there is a difference.<\/p>\r\n\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 2:<\/span> Find the Critical Values<\/h5>\r\n<p class=\"Text-1st\">Our critical values are based on two things: the directionality of the test and the level of significance. We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that <span class=\"Symbol\">a<\/span> = .05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed <span class=\"italic\">z<\/span>\u00a0test at <span class=\"Symbol\">a<\/span> = .05 are <span class=\"italic\">z<\/span>* = \u00b11.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution, as shown in <a href=\"#_idTextAnchor177\"><span class=\"Fig-table-number-underscore\">Figure 7.4<\/span><\/a>, so we can visualize the rejection region and make sure it makes sense.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer289\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor177\"><\/a>Figure 7.4.<\/span> Rejection region for <span class=\"italic\">z<\/span>*\u00a0= \u00b11.96. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/68\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region z+-1.96<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer290\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_z-1.96-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 3:<\/span> Calculate the Test Statistic and Effect Size<\/h5>\r\n<p class=\"Text-1st\">Now we come to our formal calculations. Let\u2019s say that we collect data and finds that the average amount of callbacks for Black graduates <span class=\"italic\">M<\/span> = 7.75 callbacks. We can now plug this value, along with the values presented in the original problem, into our equation for <span class=\"italic\">z<\/span>:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-93\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.9-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">So our test statistic is <span class=\"italic\">z <\/span>= \u22122.50, which we can draw onto our rejection region distribution as shown in <a href=\"#_idTextAnchor178\"><span class=\"Fig-table-number-underscore\">Figure 7.5<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer292\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor178\"><\/a>Figure 7.5.<\/span> Test statistic location. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/69\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Test Statistic Location z-2.50<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer293\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Test_Statistic_Location_z-2.50-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 4:<\/span> Make the Decision<\/h5>\r\n<p class=\"Text-1st\">Looking at <a href=\"#_idTextAnchor178\"><span class=\"Fig-table-number-underscore\">Figure 7.5<\/span><\/a>, we can see that our obtained <span class=\"italic\">z<\/span>\u00a0statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, \u22122.50 &gt; \u22121.96, so we reject the null hypothesis. We can now write our conclusion:<\/p>\r\n<p class=\"Text-indented-2p\">Reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>. Based on the sample of 25 Black graduates, we can conclude they receive fewer callbacks (<span class=\"italic\">M<\/span> = 7.75 cups) than callbacks received in the general population., <span class=\"italic\">z <\/span>= \u22122.50, <span class=\"italic\">p<\/span> &lt; .05<\/p>\r\n<p class=\"Text\">When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average number of callbacks for the population , the <span class=\"italic\">z<\/span>\u00a0statistic and <span class=\"italic\">p<\/span> value. We don\u2019t know the exact <span class=\"italic\">p<\/span> value, but we do know that because we rejected the null, it must be less than\u00a0<img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> .05.<\/p>\r\n<strong data-start=\"2672\" data-end=\"2684\">Social Justice Example:<\/strong><br data-start=\"2684\" data-end=\"2687\" \/>In an employment discrimination case, if our test statistic falls in the rejection region, we conclude there is strong evidence that hiring practices are not equal. This doesn\u2019t prove discrimination in an absolute sense, but it provides statistical evidence that the null (no difference) is unlikely to be true. Courts and policymakers often rely on this type of statistical reasoning.\r\n<h3 class=\"H1\">Other Considerations in Hypothesis Testing<\/h3>\r\n<p class=\"Text-1st\">There are several other considerations we need to keep in mind when performing hypothesis testing.<\/p>\r\n\r\n<h4 class=\"H2\">Errors in Hypothesis Testing<\/h4>\r\n<p class=\"Text-1st\">In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type\u00a0I error. More generally, a [pb_glossary id=\"674\"]<a id=\"_idTextAnchor182\"><\/a>[\/pb_glossary]<span class=\"key-term\">Type I error<\/span> occurs when a significance test results in the rejection of a true null hypothesis.<\/p>\r\n<p class=\"Text\">By one common convention, if the probability value is below .05, then the null hypothesis is rejected. Another convention, although slightly less common, is to reject the null hypothesis if the probability value is below .01. The threshold for rejecting the null hypothesis is called the <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level or simply <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. It is also called the significance level. As discussed in the introduction to hypothesis testing, it is better to interpret the probability value as an indication of the weight of evidence against the null hypothesis than as part of a decision rule for making a reject or do-not-reject decision. Therefore, keep in mind that rejecting the null hypothesis is not an all-or-nothing decision.<\/p>\r\n<p class=\"Text\">The Type I error rate is affected by the <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level: the lower the <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level the lower the Type I error rate. It might seem that <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is the probability of a Type I error. However, this is not correct. Instead, <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error.<\/p>\r\n<p class=\"Text\">The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a [pb_glossary id=\"675\"]<a id=\"_idTextAnchor183\"><\/a>[\/pb_glossary]<span class=\"key-term\">Type II error<\/span>. Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.<\/p>\r\n<p class=\"Text\">A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called <span class=\"Symbol\">b<\/span> (\u201cbeta\u201d). The probability of correctly rejecting a false null hypothesis equals 1 \u2212 <span class=\"Symbol\">b<\/span> and is called [pb_glossary id=\"671\"]<a id=\"_idTextAnchor184\"><\/a>[\/pb_glossary]<span class=\"key-term\">statistical power<\/span>. Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the\u00a0null).<\/p>\r\n<em><strong data-start=\"3227\" data-end=\"3239\">Example:<\/strong><\/em><br data-start=\"3239\" data-end=\"3242\" \/>In social justice work, Type I and Type II errors both carry important consequences. A Type I error might mean wrongly concluding there is discrimination where none exists, potentially damaging credibility. A Type II error might mean failing to detect real discrimination, allowing injustice to persist. Researchers must balance these risks \u2014 lowering alpha reduces false positives but raises the risk of missing real inequities. This is why sample size and study design are so important when researching marginalized groups, where smaller numbers can make effects harder to detect.\r\n\r\n<strong data-start=\"3966\" data-end=\"3999\">Hypothesis Testing and Equity<\/strong><br data-start=\"3999\" data-end=\"4002\" \/>Hypothesis testing provides a framework for asking whether inequalities we observe are due to chance or reflect deeper systemic patterns. By carefully defining null and alternative hypotheses, setting thresholds, and interpreting results, we can turn observations into evidence. For social justice, this is critical: it equips us to say with confidence when disparities are not random but patterned and persistent. Hypothesis testing therefore gives us a scientific foundation for challenging inequities and advocating for meaningful change.\r\n\r\n&nbsp;\r\n<h3 class=\"H1\">Exercises<\/h3>\r\n<ol>\r\n \t<li class=\"Numbered-list-Exercises-1st\">In your own words, explain what the null hypothesis is.<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What are Type I and Type II errors?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What is <img class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">Why do we phrase null and alternative hypotheses with population parameters and not sample means?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">If our null hypothesis is \u201c<span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0 <\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 40,\u201d what are the three possible alternative hypotheses?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">Determine whether you would reject or fail to reject the null hypothesis in the following situations:\r\n<ol>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">z <\/span>= 1.99, two-tailed test at <span class=\"Symbol\">a<\/span> = .05<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">z <\/span>= 0.34, <span class=\"italic\">z<\/span>* = 1.645<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">p<\/span> = .03, <span class=\"Symbol\">a<\/span> = .05<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">p<\/span> = .015, <span class=\"Symbol\">a<\/span> = .01<\/li>\r\n<\/ol>\r\n<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">You are part of a trivia team and have tracked your team\u2019s performance since you started playing, so you know that your scores are normally distributed with <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 78 and <span class=\"Symbol\">s<\/span> = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on 9 weeks\u2019 worth of score data where <img class=\"_idGenObjectAttribute-32\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn3.2-upperM-4.png\" alt=\"Upper M\" \/> is 88.75.<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">You get hired as a server at a local restaurant, and the manager tells you that servers\u2019 tips are $42 on average but vary about $12 (<img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 42, <span class=\"Symbol\">s<\/span> = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don\u2019t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the <span class=\"Symbol\">a<\/span> = .05 level of significance.<\/li>\r\n<\/ol>\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h3 class=\"H1\">Answers to Odd-Numbered Exercises<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n1) Your answer should include mention of the baseline assumption of no difference between the sample and the population.\r\n\r\n3) Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.\r\n\r\n5) <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> \u2260 40, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &gt; 40, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &lt; 40\r\n\r\n7) <span class=\"italic\">Step 1:<\/span> <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0 <\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 78 \u201cThe average score is not different after the new person joined,\u201d <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &gt; 78 \u201cThe average score has gone up since the new person joined.\u201d\r\n<span class=\"italic\">Step 2:<\/span> One-tailed test to the right, assuming <span class=\"Symbol\">a<\/span> = .05, <span class=\"italic\">z<\/span>* = 1.645\r\n<span class=\"italic\">Step 3:<\/span> <span class=\"italic\">M<\/span> = 88.75, <img class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.2-3.png\" alt=\"\" \/> = 4.24, <span class=\"italic\">z <\/span>= 2.54\r\n<span class=\"italic\">Step 4:<\/span> <span class=\"italic\">z <\/span>&gt; <span class=\"italic\">z<\/span>*, reject <span class=\"italic\">H<\/span><span class=\"subscript CharOverride-17\">0<\/span>. Based on 9 weeks of games, we can conclude that our average score (<span class=\"italic\">M<\/span> = 88.75) is higher now that the new person is on the team, <span class=\"italic\">z <\/span>= 2.69, <span class=\"italic\">p<\/span> &lt; .05, <span class=\"italic\">d<\/span> = 0.90.\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n<p class=\"Text ParaOverride-21\"><img class=\"_idGenObjectAttribute-30\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/1-4.png\" alt=\"\" \/><\/p>","rendered":"<div class=\"textbox textbox--sidebar textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h3 class=\"Chapter-element-head\">Key Terms<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor014\"><span class=\"Hyperlink-underscore\">alternative hypothesis<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor171\"><span class=\"Hyperlink-underscore\">critical value<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor179\"><span class=\"Hyperlink-underscore\">effect size<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor164\"><span class=\"Hyperlink-underscore\">hypothesis<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor167\"><span class=\"Hyperlink-underscore\">null hypothesis<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor166\"><span class=\"Hyperlink-underscore\">probability value<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor174\"><span class=\"Hyperlink-underscore CharOverride-12\">p<\/span><span class=\"Hyperlink-underscore\"> value<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor169\"><span class=\"Hyperlink-underscore\">rejection region<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor168\"><span class=\"Hyperlink-underscore\">significance level<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor184\"><span class=\"Hyperlink-underscore\">statistical power<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor176\"><span class=\"Hyperlink-underscore\">statistical significance<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor173\"><span class=\"Hyperlink-underscore\">test statistic<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor182\"><span class=\"Hyperlink-underscore\">Type I error<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor183\"><span class=\"Hyperlink-underscore\">Type II error<\/span><\/a><\/p>\n<\/div>\n<\/div>\n<p class=\"Text-1st\">This chapter lays out the basic logic and process of hypothesis testing. We will perform <span class=\"italic\">z<\/span>\u00a0tests, which use the <span class=\"italic\">z<\/span>\u00a0score formula from <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-6\/\"><span class=\"Hyperlink-underscore\">Chapter 6<\/span><\/a> and data from a sample mean to make an inference about a population.<\/p>\n<p><strong data-start=\"215\" data-end=\"256\">Social Justice and Hypothesis Testing<\/strong><br data-start=\"256\" data-end=\"259\" \/>Hypothesis testing allows us to move from description to action \u2014 it is the process that helps us decide whether patterns we see in data are likely due to chance or reflect real differences in the world. In social justice research, this means testing questions like: <em data-start=\"526\" data-end=\"653\">Are students of color disciplined more often than white students? Do women earn less than men, even when doing the same work?<\/em> By setting up null and alternative hypotheses and using probability, we can examine whether inequalities are simply random variation or evidence of systemic bias.<\/p>\n<p class=\"Text\">We have learned to calculate means, medians and modes as well as variance and standard deviations to describe data.\u00a0 Now we are moving on to make predictions and inferences about data.\u00a0 This involves developing a null and research hypothesis and using probabilities to determine if we can predict an outcome.\u00a0 For example, if we want to know the likelihood that that a person of color will be pulled over by the police compared to a White person, we can develop a research and null hypothesis to test that prediction.\u00a0 To do this we need to collect data on the amount of times a person of color is stopped by police as well as the number of times a White person is stopped.\u00a0 We might predict that it is much more likely to be pulled over if you are not White.\u00a0 We can set up hypotheses to test that prediction.<\/p>\n<h3 class=\"H1\">The Null Hypothesis<\/h3>\n<p class=\"Text-1st\">The hypothesis that an apparent effect is due to chance is called the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_666\"><a id=\"_idTextAnchor167\"><\/a><\/a><span class=\"key-term\">null hypothesis<\/span>, written <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span> (\u201c<span class=\"italic\">H<\/span>-naught\u201d). In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-84\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn7.1-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">\u00a0Another simpler way to state this null hypothesis would be:<\/p>\n<p><em><strong>Ho: obese time = average time\u00a0<\/strong><\/em><\/p>\n<p>Essentially, both Ho&#8217;s are saying there is no difference between the time spent with obese patients and average weight patients.<\/p>\n<p class=\"Text\">Keep in mind that the null hypothesis is typically the opposite of the researcher\u2019s hypothesis. In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> study, the researchers hypothesized that physicians would expect to spend less time with obese patients. The null hypothesis that the two types of patients are treated identically is put forward with the hope that it can be discredited and therefore rejected. If the null hypothesis were true, a difference as large as or larger than the sample difference of 6.7 minutes would be very unlikely to occur. Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients.<\/p>\n<p><strong data-start=\"896\" data-end=\"908\">Example:<\/strong><br data-start=\"908\" data-end=\"911\" \/>Suppose we want to test whether women faculty are promoted at the same rate as men. The null hypothesis (H\u2080) would state there is no difference in promotion rates between men and women. This becomes our baseline assumption until we see strong evidence otherwise. The research hypothesis (H\u2090) would predict a difference \u2014 for example, that women are promoted less often.<\/p>\n<p class=\"Text\">In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relationship between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups.\u00a0 However, until we have evidence against it, we must use the null hypothesis as our starting point.<\/p>\n<h3 class=\"H1\">The Alternative\u00a0 (also called the Research) Hypothesis<\/h3>\n<p class=\"Text-1st\">If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span> or <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">1 or Hr<\/span>. The <span class=\"key-term\">alternative hypothesis<\/span> is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-87\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.4-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">based on the research question itself. We should only use a directional hypothesis if we have good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-88\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.5-2.png\" alt=\"\" \/><\/p>\n<p>Social Justice Example<\/p>\n<p>If we are investigating police stops, the null might be that Black and white drivers are stopped at the same rate. The research hypothesis would be that Black drivers are stopped more frequently. Whether we make this directional (greater than) or non-directional (simply \u201cdifferent\u201d) depends on prior evidence and theory.<\/p>\n<p class=\"Text\">We will set different criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative. To understand why, we need to see where our criteria come from and how they relate to <span class=\"italic\">z<\/span>\u00a0scores and distributions.<\/p>\n<h3 class=\"H1\">Critical Values, <span class=\"bold-italic CharOverride-4\">p<\/span> Values, and Significance Level<\/h3>\n<p class=\"Text-1st\">A low probability value casts doubt on the null hypothesis. How low must the probability value be in order to conclude that the null hypothesis is false? Although there is clearly no right or wrong answer to this question, it is conventional to conclude the null hypothesis is false if the probability value is less than .05. More conservative researchers conclude the null hypothesis is false only if the probability value is less than .01. When a researcher concludes that the null hypothesis is false, the researcher is said to have rejected the null hypothesis. The probability value below which the null hypothesis is rejected is called the <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level or simply <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> (\u201calpha\u201d). It is also called the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_670\"><a id=\"_idTextAnchor168\"><\/a><\/a><span class=\"key-term\">significance level<\/span>. If <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is not explicitly specified, assume that <span class=\"Symbol\">a<\/span> = .05.<\/p>\n<p class=\"Text\">The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values where the significance level, a=.05 or less , then we have sufficient evidence to reject the null hypothesis; if not and the probability or a is above .05, we fail to reject the null (we never \u201caccept\u201d the null).<\/p>\n<p><strong data-start=\"1919\" data-end=\"1931\">Example:<\/strong><br data-start=\"1931\" data-end=\"1934\" \/>Consider wage gaps. If our sample shows women earning less than men, we need to know: is this gap large enough that it\u2019s unlikely to have occurred by chance? If the probability of observing such a gap under the null is very small (say, p &lt; .01), we reject the null and conclude the wage gap is statistically significant. But we must also remember: \u201csignificant\u201d in statistics means the effect is real, not necessarily that it is large or practically meaningful. A very small pay gap could still reach significance in a huge dataset \u2014 but the justice implications may be limited.<\/p>\n<p class=\"Text\">There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set\u00a0of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effect. The value laid out in <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span> is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true. Now recall that values of <span class=\"italic\">z <\/span>which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as extreme as <span class=\"italic\">z<\/span>\u2014or more extreme than <span class=\"italic\">z<\/span>\u2014is very small as we get into the tails of the distribution. Our significance level corresponds to the area in the tail that is exactly equal to <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. If\u00a0we use our normal criterion of <span class=\"Symbol\">a<\/span> = .05, then 5% of the area under the curve becomes what we call\u00a0the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_669\"><a id=\"_idTextAnchor169\"><\/a><\/a><span class=\"key-term\">rejection region<\/span> (also called the critical region) of the distribution. This is illustrated in <a href=\"#_idTextAnchor170\"><span class=\"Fig-table-number-underscore\">Figure 7.1<\/span><\/a>. The shaded rejection region takes us 5% of the area under the curve. Any result that falls in that region is sufficient evidence to reject the null hypothesis.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer272\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor170\"><\/a>Figure 7.1.<\/span> The rejection region for a one-tailed test. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/65\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region for One-Tailed Test<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer273\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_for_One-Tailed_Test-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">The rejection region is bounded by a specific <span class=\"italic\">z<\/span>\u00a0value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_664\"><a id=\"_idTextAnchor171\"><\/a><\/a><span class=\"key-term\">critical value<\/span>, <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>\u00a0(\u201c<span class=\"italic\">z<\/span>\u00a0crit\u201d), or <span class=\"italic\">z<\/span>* (hence the other name \u201ccritical region\u201d). Finding the critical value works exactly the same as finding the <span class=\"italic\">z<\/span>\u00a0score corresponding to any area under the curve as we did in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-1-fundamentals-of-statistics\/\"><span class=\"Hyperlink-underscore\">Unit 1<\/span><\/a>. If we go to the normal table, we will find that the <span class=\"italic\">z<\/span>\u00a0score corresponding to 5% of the area under the curve is equal to 1.645 (<span class=\"italic\">z <\/span>= 1.64 corresponds to .0505 and <span class=\"italic\">z <\/span>= 1.65 corresponds to .0495, so .05 is exactly in between them) if we go to the right and \u22121.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing and shading the distribution is helpful for keeping directionality straight.<\/p>\n<p class=\"Text\">Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don\u2019t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail\u2019s rejection region. For <span class=\"Symbol\">a<\/span> = .05, this means 2.5% of the area is in each tail, which, based on the <span class=\"italic\">z<\/span>\u00a0table, corresponds to critical values of <span class=\"italic\">z<\/span>* = \u00b11.96. This is shown in <a href=\"#_idTextAnchor172\"><span class=\"Fig-table-number-underscore\">Figure 7.2<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer274\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor172\"><\/a>Figure 7.2.<\/span> Two-tailed rejection region. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/66\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region for Two-Tailed Test<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer275\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_for_Two-Tailed_Test-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">Thus, any <span class=\"italic\">z<\/span>\u00a0score falling outside \u00b11.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use <span class=\"italic\">z<\/span>\u00a0scores in this way, the obtained value of <span class=\"italic\">z <\/span>(sometimes called <span class=\"italic\">z<\/span>\u00a0obtained and abbreviated <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>) is something known as a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_673\"><a id=\"_idTextAnchor173\"><\/a><\/a><span class=\"key-term\">test statistic<\/span>, which is simply an inferential statistic used to test a null hypothesis. The formula for our <span class=\"italic\">z<\/span>\u00a0statistic has not changed:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-90\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.6-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">To formally test our hypothesis, we compare our obtained <span class=\"italic\">z<\/span>\u00a0statistic to our critical <span class=\"italic\">z<\/span>\u00a0value. If <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>\u00a0&gt;\u00a0<span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>, that means it falls in the rejection region (to see why, draw a line for <span class=\"italic\">z <\/span>= 2.5 on <a href=\"#_idTextAnchor170\"><span class=\"Fig-table-number-underscore\">Figure 7.1<\/span><\/a> or <a href=\"#_idTextAnchor172\"><span class=\"Fig-table-number-underscore\">Figure 7.<\/span><span class=\"Fig-table-number-underscore\">2<\/span><\/a>) and so we reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>. If <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span> &lt; <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">crit<\/span>, we fail to reject. Remember that as <span class=\"italic\">z <\/span>gets larger, the corresponding area under the curve beyond <span class=\"italic\">z <\/span>gets smaller. Thus, the proportion, or <span class=\"italic\">p<\/span> value, will be smaller than the area for <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.<\/p>\n<p class=\"Text\">The <span class=\"italic\">z<\/span>\u00a0statistic is very useful when we are doing our calculations by hand. However, when we use computer software, it will report to us a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_668\"><a id=\"_idTextAnchor174\"><\/a><\/a><span class=\"key-term CharOverride-2\">p<\/span><span class=\"key-term\"> value<\/span>, which is simply the proportion of the area under the curve in the tails beyond our obtained <span class=\"italic\">z<\/span>\u00a0statistic. We can directly compare this <span class=\"italic\">p<\/span> value to <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> to test our null hypothesis: if <span class=\"italic\">p<\/span> &lt; <span class=\"Symbol\">a<\/span>, we reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>, but if <span class=\"italic\">p<\/span> &gt; <span class=\"Symbol\">a<\/span>, we fail to reject. Note also that the reverse is always true. If we use critical values to test our hypothesis, we will always know if <span class=\"italic\">p<\/span> is greater than or less than <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. If we reject, we know that <span class=\"italic\">p<\/span> &lt; <span class=\"Symbol\">a<\/span> because the obtained <span class=\"italic\">z<\/span>\u00a0statistic falls farther out into the tail than the critical <span class=\"italic\">z<\/span>\u00a0value that corresponds to <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, so the proportion (<span class=\"italic\">p<\/span> value) for that <span class=\"italic\">z<\/span>\u00a0statistic will be smaller. Conversely, if we fail to reject, we know that the proportion will be larger than <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> because the <span class=\"italic\">z<\/span>\u00a0statistic will not be as far into the tail. This is illustrated for a one-tailed test in <a href=\"#_idTextAnchor175\"><span class=\"Fig-table-number-underscore\">Figure 7.3<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer282\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor175\"><\/a>Figure 7.3.<\/span> Relationship between <span class=\"Symbol\">a<\/span>, <span class=\"italic\">z<\/span><span class=\"subscript _idGenCharOverride-1\">obt<\/span>, and <span class=\"italic\">p<\/span>. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/67\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between alpha, z-obt, and p<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer283\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_alpha_z-obt_and_p-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">When the null hypothesis is rejected, the effect is said to have <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_672\"><a id=\"_idTextAnchor176\"><\/a><\/a><span class=\"key-term\">statistical significance<\/span>, or be statistically significant. For example, in the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> case study, the probability value is .0057. Therefore, the effect of obesity is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is important to keep in mind that statistical significance means only that the null hypothesis of exactly no effect is rejected; it does not mean that the effect is important, which is what \u201csignificant\u201d usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is significant does not tell you about how large or important the effect is.<\/p>\n<p class=\"Text\"><span class=\"italic\">Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.<\/span><\/p>\n<p class=\"Text\">Why does the word \u201csignificant\u201d in the phrase \u201cstatistically significant\u201d mean something so different from other uses of the word? Interestingly, this is because the meaning of \u201csignificant\u201d in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was \u201csignificant\u201d if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of \u201csignificant\u201d changed, leading to the potential misinterpretation.<\/p>\n<h3 class=\"H1 ParaOverride-19\">The Hypothesis Testing Process<\/h3>\n<h4 class=\"H2\">A Four-Step Procedure<\/h4>\n<p class=\"Text-1st\">The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook and course, and although the hypothesis and statistics we use will change, this process will not.<\/p>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 1:<\/span> State the Hypotheses<\/h5>\n<p class=\"Text-1st\">Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above <span class=\"italic\">and<\/span> in words, explaining in normal English what each one means in terms of the research question.<\/p>\n<p>The obesity example null hypothesis in English would be<\/p>\n<p><strong>Ho:There is no difference the time spent with obesity patients compared to time spent with average patients.<\/strong><\/p>\n<p>The alternative hypothesis in English would be:<\/p>\n<p><strong>Ha: There is a difference between the time spent with obese patients compared with average patients.<\/strong><\/p>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 2:<\/span> Find the Critical Values<\/h5>\n<p class=\"Text-1st\">Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.<\/p>\n<h5 class=\"H3-step\"><span class=\"Step-- CharOverride-16\">Step 3:<\/span> Calculate the Test Statistic<\/h5>\n<p class=\"Text-1st\">Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic\u2014in this case <span class=\"italic\">z<\/span>. This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.<\/p>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 4:<\/span> Make the Decision<\/h5>\n<p class=\"Text-1st\">Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.<\/p>\n<p class=\"Example-New\"><span class=\"Example--\">Example Interview Callbacks<\/span><\/p>\n<p class=\"Text-1st\">Let\u2019s see how hypothesis testing works in action by working through an example. Here is an example comparing a sample mean to a population mean. Say you want to look at the number of interview callbacks for recent college graduates based on ethnicity.\u00a0 The known population mean for interview callbacks is <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 8.00 and the known population standard deviation is <span class=\"Symbol\">s<\/span> = 0.50.\u00a0 You survey 25 college graduates who are Black and find that, on average, they receive 7.75 callbacks.\u00a0 This scenario has all of the information we need to begin our hypothesis testing procedure.<\/p>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 1:<\/span> State the Hypotheses<\/h5>\n<p class=\"Text-1st\">We will need both a null and an alternative hypothesis written both mathematically and in words. We\u2019ll always start with the null hypothesis:<\/p>\n<p>EXAMPLE:<\/p>\n<p>H <sub>O<\/sub>:\u00a0 There is no difference in the number of callbacks between the sample of Black graduates and the general population of graduates.<\/p>\n<p class=\"Text\">Our assumption of no difference, the null hypothesis, is that this mean is exactly the same as the known population mean value we want it to match, 8.00. Now let\u2019s do the alternative:<\/p>\n<p class=\"Equation\">H<sub>A<\/sub>: There is a difference in the number of callbacks between the sample of Black graduates and the general population of graduates.<\/p>\n<p class=\"Text\">In this case, we don\u2019t know if the number of callbacks is more or less, so we do a two-tailed alternative hypothesis that there is a difference.<\/p>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 2:<\/span> Find the Critical Values<\/h5>\n<p class=\"Text-1st\">Our critical values are based on two things: the directionality of the test and the level of significance. We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that <span class=\"Symbol\">a<\/span> = .05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed <span class=\"italic\">z<\/span>\u00a0test at <span class=\"Symbol\">a<\/span> = .05 are <span class=\"italic\">z<\/span>* = \u00b11.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution, as shown in <a href=\"#_idTextAnchor177\"><span class=\"Fig-table-number-underscore\">Figure 7.4<\/span><\/a>, so we can visualize the rejection region and make sure it makes sense.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer289\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor177\"><\/a>Figure 7.4.<\/span> Rejection region for <span class=\"italic\">z<\/span>*\u00a0= \u00b11.96. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/68\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Rejection Region z+-1.96<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer290\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Rejection_Region_z-1.96-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 3:<\/span> Calculate the Test Statistic and Effect Size<\/h5>\n<p class=\"Text-1st\">Now we come to our formal calculations. Let\u2019s say that we collect data and finds that the average amount of callbacks for Black graduates <span class=\"italic\">M<\/span> = 7.75 callbacks. We can now plug this value, along with the values presented in the original problem, into our equation for <span class=\"italic\">z<\/span>:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-93\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.9-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">So our test statistic is <span class=\"italic\">z <\/span>= \u22122.50, which we can draw onto our rejection region distribution as shown in <a href=\"#_idTextAnchor178\"><span class=\"Fig-table-number-underscore\">Figure 7.5<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer292\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor178\"><\/a>Figure 7.5.<\/span> Test statistic location. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/69\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Test Statistic Location z-2.50<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer293\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Test_Statistic_Location_z-2.50-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h5 class=\"H3-step\"><span class=\"Step--\">Step 4:<\/span> Make the Decision<\/h5>\n<p class=\"Text-1st\">Looking at <a href=\"#_idTextAnchor178\"><span class=\"Fig-table-number-underscore\">Figure 7.5<\/span><\/a>, we can see that our obtained <span class=\"italic\">z<\/span>\u00a0statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, \u22122.50 &gt; \u22121.96, so we reject the null hypothesis. We can now write our conclusion:<\/p>\n<p class=\"Text-indented-2p\">Reject <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0<\/span>. Based on the sample of 25 Black graduates, we can conclude they receive fewer callbacks (<span class=\"italic\">M<\/span> = 7.75 cups) than callbacks received in the general population., <span class=\"italic\">z <\/span>= \u22122.50, <span class=\"italic\">p<\/span> &lt; .05<\/p>\n<p class=\"Text\">When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average number of callbacks for the population , the <span class=\"italic\">z<\/span>\u00a0statistic and <span class=\"italic\">p<\/span> value. We don\u2019t know the exact <span class=\"italic\">p<\/span> value, but we do know that because we rejected the null, it must be less than\u00a0<img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> .05.<\/p>\n<p><strong data-start=\"2672\" data-end=\"2684\">Social Justice Example:<\/strong><br data-start=\"2684\" data-end=\"2687\" \/>In an employment discrimination case, if our test statistic falls in the rejection region, we conclude there is strong evidence that hiring practices are not equal. This doesn\u2019t prove discrimination in an absolute sense, but it provides statistical evidence that the null (no difference) is unlikely to be true. Courts and policymakers often rely on this type of statistical reasoning.<\/p>\n<h3 class=\"H1\">Other Considerations in Hypothesis Testing<\/h3>\n<p class=\"Text-1st\">There are several other considerations we need to keep in mind when performing hypothesis testing.<\/p>\n<h4 class=\"H2\">Errors in Hypothesis Testing<\/h4>\n<p class=\"Text-1st\">In the <a href=\"#_idTextAnchor165\"><span class=\"Hyperlink-underscore\">Physicians\u2019 Reactions<\/span><\/a> case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type\u00a0I error. More generally, a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_674\"><a id=\"_idTextAnchor182\"><\/a><\/a><span class=\"key-term\">Type I error<\/span> occurs when a significance test results in the rejection of a true null hypothesis.<\/p>\n<p class=\"Text\">By one common convention, if the probability value is below .05, then the null hypothesis is rejected. Another convention, although slightly less common, is to reject the null hypothesis if the probability value is below .01. The threshold for rejecting the null hypothesis is called the <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level or simply <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>. It is also called the significance level. As discussed in the introduction to hypothesis testing, it is better to interpret the probability value as an indication of the weight of evidence against the null hypothesis than as part of a decision rule for making a reject or do-not-reject decision. Therefore, keep in mind that rejecting the null hypothesis is not an all-or-nothing decision.<\/p>\n<p class=\"Text\">The Type I error rate is affected by the <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level: the lower the <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> level the lower the Type I error rate. It might seem that <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is the probability of a Type I error. However, this is not correct. Instead, <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/> is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error.<\/p>\n<p class=\"Text\">The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_675\"><a id=\"_idTextAnchor183\"><\/a><\/a><span class=\"key-term\">Type II error<\/span>. Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.<\/p>\n<p class=\"Text\">A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called <span class=\"Symbol\">b<\/span> (\u201cbeta\u201d). The probability of correctly rejecting a false null hypothesis equals 1 \u2212 <span class=\"Symbol\">b<\/span> and is called <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_222_671\"><a id=\"_idTextAnchor184\"><\/a><\/a><span class=\"key-term\">statistical power<\/span>. Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the\u00a0null).<\/p>\n<p><em><strong data-start=\"3227\" data-end=\"3239\">Example:<\/strong><\/em><br data-start=\"3239\" data-end=\"3242\" \/>In social justice work, Type I and Type II errors both carry important consequences. A Type I error might mean wrongly concluding there is discrimination where none exists, potentially damaging credibility. A Type II error might mean failing to detect real discrimination, allowing injustice to persist. Researchers must balance these risks \u2014 lowering alpha reduces false positives but raises the risk of missing real inequities. This is why sample size and study design are so important when researching marginalized groups, where smaller numbers can make effects harder to detect.<\/p>\n<p><strong data-start=\"3966\" data-end=\"3999\">Hypothesis Testing and Equity<\/strong><br data-start=\"3999\" data-end=\"4002\" \/>Hypothesis testing provides a framework for asking whether inequalities we observe are due to chance or reflect deeper systemic patterns. By carefully defining null and alternative hypotheses, setting thresholds, and interpreting results, we can turn observations into evidence. For social justice, this is critical: it equips us to say with confidence when disparities are not random but patterned and persistent. Hypothesis testing therefore gives us a scientific foundation for challenging inequities and advocating for meaningful change.<\/p>\n<p>&nbsp;<\/p>\n<h3 class=\"H1\">Exercises<\/h3>\n<ol>\n<li class=\"Numbered-list-Exercises-1st\">In your own words, explain what the null hypothesis is.<\/li>\n<li class=\"Numbered-list-Exercises\">What are Type I and Type II errors?<\/li>\n<li class=\"Numbered-list-Exercises\">What is <img decoding=\"async\" class=\"_idGenObjectAttribute-89\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn7.1-alpha-2.png\" alt=\"alpha\" \/>?<\/li>\n<li class=\"Numbered-list-Exercises\">Why do we phrase null and alternative hypotheses with population parameters and not sample means?<\/li>\n<li class=\"Numbered-list-Exercises\">If our null hypothesis is \u201c<span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0 <\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 40,\u201d what are the three possible alternative hypotheses?<\/li>\n<li class=\"Numbered-list-Exercises\">Determine whether you would reject or fail to reject the null hypothesis in the following situations:\n<ol>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">z <\/span>= 1.99, two-tailed test at <span class=\"Symbol\">a<\/span> = .05<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">z <\/span>= 0.34, <span class=\"italic\">z<\/span>* = 1.645<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">p<\/span> = .03, <span class=\"Symbol\">a<\/span> = .05<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">p<\/span> = .015, <span class=\"Symbol\">a<\/span> = .01<\/li>\n<\/ol>\n<\/li>\n<li class=\"Numbered-list-Exercises\">You are part of a trivia team and have tracked your team\u2019s performance since you started playing, so you know that your scores are normally distributed with <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 78 and <span class=\"Symbol\">s<\/span> = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on 9 weeks\u2019 worth of score data where <img decoding=\"async\" class=\"_idGenObjectAttribute-32\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn3.2-upperM-4.png\" alt=\"Upper M\" \/> is 88.75.<\/li>\n<li class=\"Numbered-list-Exercises\">You get hired as a server at a local restaurant, and the manager tells you that servers\u2019 tips are $42 on average but vary about $12 (<img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 42, <span class=\"Symbol\">s<\/span> = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don\u2019t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the <span class=\"Symbol\">a<\/span> = .05 level of significance.<\/li>\n<\/ol>\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h3 class=\"H1\">Answers to Odd-Numbered Exercises<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>1) Your answer should include mention of the baseline assumption of no difference between the sample and the population.<\/p>\n<p>3) Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.<\/p>\n<p>5) <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> \u2260 40, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &gt; 40, <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &lt; 40<\/p>\n<p>7) <span class=\"italic\">Step 1:<\/span> <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">0 <\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> = 78 \u201cThe average score is not different after the new person joined,\u201d <span class=\"italic\">H<\/span><span class=\"subscript _idGenCharOverride-1\">A<\/span>: <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn2.14-mu-5.png\" alt=\"mu\" \/> &gt; 78 \u201cThe average score has gone up since the new person joined.\u201d<br \/>\n<span class=\"italic\">Step 2:<\/span> One-tailed test to the right, assuming <span class=\"Symbol\">a<\/span> = .05, <span class=\"italic\">z<\/span>* = 1.645<br \/>\n<span class=\"italic\">Step 3:<\/span> <span class=\"italic\">M<\/span> = 88.75, <img decoding=\"async\" class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.2-3.png\" alt=\"\" \/> = 4.24, <span class=\"italic\">z <\/span>= 2.54<br \/>\n<span class=\"italic\">Step 4:<\/span> <span class=\"italic\">z <\/span>&gt; <span class=\"italic\">z<\/span>*, reject <span class=\"italic\">H<\/span><span class=\"subscript CharOverride-17\">0<\/span>. Based on 9 weeks of games, we can conclude that our average score (<span class=\"italic\">M<\/span> = 88.75) is higher now that the new person is on the team, <span class=\"italic\">z <\/span>= 2.69, <span class=\"italic\">p<\/span> &lt; .05, <span class=\"italic\">d<\/span> = 0.90.<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p class=\"Text ParaOverride-21\"><img decoding=\"async\" class=\"_idGenObjectAttribute-30\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/1-4.png\" alt=\"\" \/><\/p>\n<div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_222_666\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_666\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_670\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_670\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_669\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_669\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_664\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_664\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_673\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_673\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_668\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_668\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_672\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_672\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_674\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_674\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_675\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_675\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_222_671\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_222_671\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":7,"menu_order":1,"template":"","meta":{"pb_show_title":"","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-222","chapter","type-chapter","status-publish","hentry"],"part":187,"_links":{"self":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/users\/7"}],"version-history":[{"count":10,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/222\/revisions"}],"predecessor-version":[{"id":972,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/222\/revisions\/972"}],"part":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/parts\/187"}],"metadata":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/222\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/media?parent=222"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapter-type?post=222"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/contributor?post=222"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/license?post=222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}