
{"id":185,"date":"2021-12-15T18:32:30","date_gmt":"2021-12-15T18:32:30","guid":{"rendered":"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-6\/"},"modified":"2025-08-28T01:33:27","modified_gmt":"2025-08-28T01:33:27","slug":"chapter-6","status":"publish","type":"chapter","link":"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-6\/","title":{"raw":"Chapter 6: Sampling Distributions","rendered":"Chapter 6: Sampling Distributions"},"content":{"raw":"<div class=\"textbox textbox--sidebar textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h2 class=\"Chapter-title\">Key Terms<\/h2>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor154\"><span class=\"Hyperlink-underscore\">central limit theorem<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor150\"><span class=\"Hyperlink-underscore\">distribution of sample means<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor155\"><span class=\"Hyperlink-underscore\">law of large numbers<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor161\"><span class=\"Hyperlink-underscore\">observed effect<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor151\"><span class=\"Hyperlink-underscore\">sampling distribution<\/span><\/a><\/p>\r\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor152\"><span class=\"Hyperlink-underscore\">standard error<\/span><\/a><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p class=\"Text-1st\">We have come to the final chapter in this unit. We will now take the logic, ideas, and techniques we have developed and put them together to see how we can take a sample of data and use it to make inferences about what\u2019s truly happening in the broader population. This is the final piece of the puzzle that we need to understand in order to have the groundwork necessary for formal hypothesis testing. Though some of the concepts in this chapter seem strange, they are all simple extensions of what we have already learned in previous chapters, especially <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-4\/\"><span class=\"Hyperlink-underscore\">Chapter 4<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-5\/\"><span class=\"Hyperlink-underscore\">Chapter 5<\/span><\/a>.<\/p>\r\nWhen we move from describing data to making inferences, we are taking the step that allows us to say something larger about the world. In social justice work, this is essential: we rarely have access to everyone in a population, so we rely on samples to draw conclusions about inequality, discrimination, or opportunity. For example, a survey of student experiences with campus safety might only include a few hundred respondents, yet the results inform policy for thousands. Inferential statistics give us the tools to distinguish between natural variation in samples and real evidence of systemic disparities.\r\n<h3 class=\"H1\">People, Samples, and Populations<\/h3>\r\n<p class=\"Text-1st\">Most of what we have dealt with so far has concerned individual scores grouped into samples, with those samples being drawn from and, hopefully, representative of a population. We saw how we can understand the location of individual scores within a sample\u2019s distribution via <span class=\"italic\">z<\/span>\u00a0scores, and how we can extend that to understand how likely it is to observe scores higher or lower than an individual score via probability.<\/p>\r\n<p class=\"Text\">Inherent in this work is the notion that an individual score will differ from the mean, which we quantify as a <span class=\"italic\">z<\/span>\u00a0score. All of the individual scores will differ from the mean in different amounts and different directions, which is natural and expected. We quantify these differences as variance and standard deviation. Measures of spread and the idea of variability in observations is a key principle in inferential statistics. We know that any observation, whether it is a single score, a set of scores, or a particular descriptive statistic will differ from the center of whatever distribution it belongs in.<\/p>\r\n<p class=\"Text\">This is equally true of things outside of statistics and format data collection and analysis. Some days you hear your alarm and wake up easily, but other days you need to hit snooze a few (dozen) times. Some days traffic is light, but other days it is very heavy. Some classes you are able to focus, pay attention, and take good notes, but other days you find yourself zoning out the entire time. Each individual observation is an insight but is not, by itself, the entire story, and it takes an extreme deviation from what we expect for us to think that something strange is going on. Being a little sleepy is normal, but being completely unable to get out of bed might indicate that we are sick. Light traffic is a good thing, but almost no cars on the road might make us think we forgot it is Saturday. Zoning out occasionally is fine, but if we cannot focus at all, we might be in a stats class rather than a fun one.<\/p>\r\n<strong data-start=\"964\" data-end=\"976\">Example:<\/strong><br data-start=\"976\" data-end=\"979\" \/>Imagine surveying wages for a small group of Latina women in a city. Their sample mean may not exactly match the true average wages for all Latina women in that city. Some deviation is expected \u2014 this is sampling error. But if the sample mean is far lower than expected, it may indicate structural wage inequality. Understanding that \u201cextreme deviations\u201d are meaningful is the first step toward evidence-based advocacy.\r\n<p class=\"Text\">All of these principles carry forward from scores within samples to samples within populations. Just like an individual score will differ from its mean, an individual sample mean will differ from the true population mean. We encountered this principle in earlier chapters: sampling error. As mentioned way back in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-1\/\"><span class=\"Hyperlink-underscore\">Chapter 1<\/span><\/a>, sampling error is an incredibly important principle. We know ahead of time that if we collect data and compute a sample, the observed value of that sample will be at least slightly off from what we expect it to be based on our supposed population mean; this is natural and expected. However, if our sample mean is extremely different from what we expect based on the population mean, there may be something going on.<\/p>\r\n\r\n<h3 class=\"H1\">The Sampling Distribution of Sample Means<\/h3>\r\n<p class=\"Text-1st\">To see how we use sampling error, we will learn about a new, theoretical distribution known as the sampling distribution. In the same way that we can gather a lot of individual scores and put them together to form a distribution with a center and spread, if we were to take many samples, all of the same size, and calculate the mean of each of those, we could put those means together to form a distribution. This new distribution is, intuitively, known as the [pb_glossary id=\"658\"]<a id=\"_idTextAnchor150\"><\/a>[\/pb_glossary]<span class=\"key-term\">distribution of sample means<\/span>. It is one example of what we call a [pb_glossary id=\"661\"]<a id=\"_idTextAnchor151\"><\/a>[\/pb_glossary]<span class=\"key-term\">sampling distribution<\/span>, which can be formed from a set of any statistic, such as a mean, a test statistic, or a correlation coefficient (more on the latter two in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit\u00a02<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-3-additional-hypothesis-tests\/\"><span class=\"Hyperlink-underscore\">Unit 3<\/span><\/a>). For our purposes, understanding the distribution of sample means will be enough to see how all other sampling distributions work to enable and inform our inferential analyses, so these two terms will be used interchangeably from here on out. Let\u2019s take a deeper look at some of its characteristics.<\/p>\r\nExample:<br data-start=\"1543\" data-end=\"1546\" \/>Consider studying voter turnout. If we draw repeated samples of 200 people in a district, each sample mean for \u201cpercentage who voted\u201d will vary. When we put those together into a sampling distribution, the center will align with the true turnout rate. From a social justice perspective, this helps us see whether turnout differences across groups (for example, between younger voters and older voters) reflect chance or real barriers, such as voter ID laws or limited polling access.\r\n<p class=\"Text\">The sampling distribution of sample means can be described by its shape, center, and spread, just like any of the other distributions we have worked with. The shape of our sampling distribution is normal: a bell-shaped curve with a single peak and two tails extending symmetrically in either direction, just like what we saw in previous chapters. The center of the sampling distribution of sample means\u2014which is, itself, the mean or average of the means\u2014is the true population mean, <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/>. This will sometimes be written as <img class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.1new-2.png\" alt=\"\" \/> to denote it as the mean of the sample means. The spread of the sampling distribution is called the [pb_glossary id=\"662\"]<a id=\"_idTextAnchor152\"><\/a>[\/pb_glossary]<span class=\"key-term\">standard error<\/span>, the quantification of sampling error, denoted <img class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.2-2.png\" alt=\"\" \/>. The formula for standard error is:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-75\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.3-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Notice that the sample size is in this equation. As stated above, the sampling distribution refers to samples of a specific size. That is, all sample means must be calculated from samples of the same size <span class=\"italic\">n<\/span>, such as <span class=\"italic\">n<\/span> = 10, <span class=\"italic\">n<\/span> = 30, or <span class=\"italic\">n<\/span> = 100. This sample size refers to how many people or observations are in each individual sample, <span class=\"italic\">not <\/span>how many samples are used to form the sampling distribution. This is because the sampling distribution is a theoretical distribution, not one we will ever actually calculate or observe. <a href=\"#_idTextAnchor153\"><span class=\"Fig-table-number-underscore\">Figure 6.1<\/span><\/a> displays the principles stated here in graphical form.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer237\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor153\"><\/a>Figure 6.1.<\/span> The sampling distribution of sample means. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/59\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distribution of Sample Means<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer238\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distribution_of_Sample_Means-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h4 class=\"H2\">Two Important Axioms<\/h4>\r\n<p class=\"Text-1st\">We just learned that the sampling distribution is theoretical: we never actually see it. If that is true, then how can we know it works? How can we use something we don\u2019t see? The answer lies in two very important mathematical facts: the central limit theorem and the law of large numbers. We will not go into the math behind how these statements were derived, but knowing what they are and what they mean is important to understanding why inferential statistics work and how we can draw conclusions about a population based on information gained from a single sample.<\/p>\r\n\r\n<h5 class=\"H3\">Central Limit Theorem<\/h5>\r\n<p class=\"Text-1st\">The [pb_glossary id=\"657\"]<a id=\"_idTextAnchor154\"><\/a>[\/pb_glossary]<span class=\"key-term\">central limit theorem<\/span> states:<\/p>\r\n<p class=\"Text-indented-2p\">For samples of a single size <span class=\"italic CharOverride-14\">n<\/span>, drawn from a population with a given mean <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/> and variance <span class=\"Symbol\">s<\/span><span class=\"superscript CharOverride-15\">2<\/span>, the sampling distribution of sample means <span class=\"CharOverride-14\">w<\/span>ill h<span class=\"CharOverride-14\">a<\/span>ve a <span class=\"CharOverride-14\">m<\/span>ean <img class=\"_idGenObjectAttribute-76\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.4-2.png\" alt=\"\" \/> and va<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">a<\/span>nce <img class=\"_idGenObjectAttribute-77\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.5-2.png\" alt=\"\" \/>. <span class=\"CharOverride-14\">T<\/span>h<span class=\"CharOverride-14\">i<\/span>s <span class=\"CharOverride-14\">di<\/span>st<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">bu<\/span>t<span class=\"CharOverride-14\">io<\/span>n will approach normality as <span class=\"italic\">n <\/span>increases.<\/p>\r\n<p class=\"Text\">From this, we are able to find the standard deviation of our sampling distribution, the standard error. As you can see, just like any other standard deviation, the standard error is simply the square root of the variance of the distribution.<\/p>\r\n<p class=\"Text\">The last sentence of the central limit theorem states that the sampling distribution will be more normal as the sample size of the samples used to create it increases. What this means is that bigger samples will create a more normal distribution, so we are better able to use the techniques we developed for normal distributions and probabilities. So how large is large enough? In general, a sampling distribution will be normal if either of two characteristics is true: (1) the population from which the samples are drawn is normally distributed or (2) the sample size is equal to or greater than 30. This second criterion is very important because it enables us to use methods developed for normal distributions even if the true population distribution is skewed.<\/p>\r\n<strong data-start=\"2139\" data-end=\"2151\">Example:<\/strong><br data-start=\"2151\" data-end=\"2154\" \/>Suppose we want to know if Black homeowners are systematically offered higher mortgage interest rates than white homeowners. If we only take a very small sample of loans, results might be noisy and inconclusive. But the central limit theorem assures us that larger, repeated samples will produce a more normal sampling distribution, making it possible to test whether observed differences are the result of chance or discrimination. In practice, this is exactly how researchers have uncovered patterns of systemic bias in lending.\r\n<h5 class=\"H3\">Law of Large Numbers<\/h5>\r\n<p class=\"Text-1st\">The [pb_glossary id=\"659\"]<a id=\"_idTextAnchor155\"><\/a>[\/pb_glossary]<span class=\"key-term\">law of large numbers<\/span> simply states that as our sample size increases, the probability that our sample mean is an accurate representation of the true population mean also increases. It is the formal mathematical way to state that larger samples are more accurate.<\/p>\r\n<strong data-start=\"2817\" data-end=\"2829\">Example:<\/strong><br data-start=\"2829\" data-end=\"2832\" \/>Public health researchers studying environmental justice may want to know if living near industrial sites is linked to higher asthma rates. A small survey might produce unstable results. But as sample size grows, the law of large numbers tells us that the sample mean rate of asthma will approach the true population rate. This makes findings more reliable and strengthens arguments for policy change.\r\n<p class=\"Text\">The law of large numbers is related to the central limit theorem, specifically the formulas for variance and standard error. Notice that the sample size appears in the denominators of those formulas. A larger denominator in any fraction means that the overall value of the fraction gets smaller (i.e., 1\/2 = 0.50, 1\/3 = 0.33, 1\/4 = 0.25, and so on). Thus, larger sample sizes will create smaller standard errors. We already know that standard error is the spread of the sampling distribution and that a smaller spread creates a narrower distribution. Therefore, larger sample sizes create narrower sampling distributions, which increases the probability that a sample mean will be close to the center and decreases the probability that it will be in the tails. This is illustrated in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a> and <a href=\"#_idTextAnchor157\"><span class=\"Fig-table-number-underscore\">Figure 6.3<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer242\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor156\"><\/a>Figure 6.2.<\/span> Sampling distributions from the same population with <span class=\"Symbol\">m<\/span>\u00a0=\u00a050 and <span class=\"Symbol\">s<\/span> = 10 but different sample sizes (<span class=\"italic\">N <\/span>= 10, <span class=\"italic\">N <\/span>= 30, <span class=\"italic\">N\u00a0<\/span>=\u00a050, <span class=\"italic\">N <\/span>= 100). <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/60\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distributions with Different Sample Sizes<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer243\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distributions_with_Different_Sample_Sizes-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer244\" class=\"Legend-below\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor157\"><\/a>Figure 6.3.<\/span> Relationship between sample size and standard error for a constant <span class=\"Symbol\">s<\/span> = 10. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/61\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between Sample Size and Standard Error<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer245\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_Sample_Size_and_Standard_Error-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h3 class=\"H1\">Using Standard Error for Probability<\/h3>\r\n<p class=\"Text-1st\">In this chapter, we saw that we can use <span class=\"italic\">z<\/span>\u00a0scores to split up a normal distribution and calculate the proportion of the area under the curve in one of the new regions, giving us the probability of randomly selecting a <span class=\"italic\">z<\/span>\u00a0score in that range. We can follow the exact sample process for sample means, converting them into <span class=\"italic\">z<\/span>\u00a0scores and calculating probabilities. The only difference is that instead of dividing a raw score by the standard deviation, we divide the sample mean by the standard error.<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-78\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.7-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Let\u2019s say we are drawing samples from a population with a mean of 50 and a standard deviation of 10 (the same values used in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a>). What is the probability that we get a random sample of size 10 with a mean greater than or equal to 55? That is, for <span class=\"italic\">n<\/span> = 10, what is the probability that <img class=\"_idGenObjectAttribute-79\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.8-2.png\" alt=\"\" \/>? First, we need to convert this sample mean score into a <span class=\"italic\">z<\/span>\u00a0score:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-80\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.9-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Now we need to shade the area under the normal curve corresponding to scores greater than <span class=\"italic\">z\u00a0<\/span>=\u00a01.58, as in <a href=\"#_idTextAnchor158\"><span class=\"Fig-table-number-underscore\">Figure 6.4<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer249\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor158\"><\/a>Figure 6.4.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 1.58. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/62\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater than z1.58<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer250\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z1.58-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">Now we go to our <span class=\"italic\">z<\/span>\u00a0table and find that the area to the left of <span class=\"italic\">z <\/span>= 1.58 is .9429. Finally, because we need the area to the right (per our shaded diagram), we simply subtract this from 1 to get 1.00\u00a0\u2212\u00a0.9429 = .0571. So, the probability of randomly drawing a sample of 10 people from a population with a mean of 50 and standard deviation of 10 whose sample mean is 55 or more is <span class=\"italic\">p<\/span> =\u00a0.0571, or 5.71%. Notice that we are talking about means that are 55 <span class=\"italic\">or more<\/span>. That is because, strictly speaking, it\u2019s impossible to calculate the probability of a score taking on exactly 1 value since the \u201cshaded region\u201d would just be a line with no area to calculate.<\/p>\r\n<p class=\"Text\">Now let\u2019s do the same thing, but assume that instead of only having a sample of 10 people we took a sample of 50 people. First, we find <span class=\"italic\">z<\/span>:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-81\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.10-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Then we shade the appropriate region of the normal distribution, as shown in <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer252\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor159\"><\/a>Figure 6.5.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 3.55. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/63\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater Than z3.55<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer253\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z3.55-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">Notice that no region of <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a> appears to be shaded. That is because the area under the curve that far out into the tail is so small that it can\u2019t even be seen (the red line has been added to show exactly where the region starts). Thus, we already know that the probability must be smaller for <span class=\"italic\">N\u00a0<\/span>=\u00a050 than <span class=\"italic\">N <\/span>= 10 because the size of the area (the proportion) is much smaller.<\/p>\r\n<p class=\"Text\">The table only goes up to 4.00 because everything beyond that is almost 0 and changes so little that it\u2019s not worth printing values. The closest we can get is subtracting the largest value, .9990, from 1 to get .001. We know that, technically, the actual probability is smaller, so we say that the probability is <span class=\"italic\">p<\/span> &lt; .001, or less than 0.1%.<\/p>\r\n<p class=\"Text\">This example shows what an impact sample size can have. From the same population, looking for exactly the same thing, changing only the sample size took us from roughly a 5% chance (or about 1\/20 odds) to a less than 0.1% chance (or less than 1 in 1000). As the sample size <span class=\"italic\">n <\/span>increased, the standard error decreased, which in turn caused the value of <span class=\"italic\">z <\/span>to increase, which finally caused the <span class=\"italic\">p<\/span> value (a term for probability we will use a lot in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>) to decrease. You can think of this relationship like gears: turning the first gear (sample size) clockwise causes the next gear (standard error) to turn counterclockwise, which causes the third gear (<span class=\"italic\">z<\/span>) to turn clockwise, which finally causes the last gear (probability) to turn counterclockwise. All of these pieces fit together, and the relationships will always be the same: As Sample size (n) increases, the Standard Error decreases. As Z increases, the probability decreases.<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-82\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.11-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Because the sample mean will naturally move around due to sampling error, our observed effect will also change naturally. In the context of our formula for <span class=\"italic\">z<\/span>, then, our standard error is how much we would naturally expect the observed effect to change. Changing by a little is completely normal, but changing by a lot might indicate something is going on. This is the basis of inferential statistics and the logic behind hypothesis testing, the subject of <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>.<\/p>\r\n<p class=\"Text-1st\">We have come to the final chapter in this unit. We will now take the logic, ideas, and techniques we have developed and put them together to see how we can take a sample of data and use it to make inferences about what\u2019s truly happening in the broader population. This is the final piece of the puzzle that we need to understand in order to have the groundwork necessary for formal hypothesis testing. Though some of the concepts in this chapter seem strange, they are all simple extensions of what we have already learned in previous chapters, especially <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-4\/\"><span class=\"Hyperlink-underscore\">Chapter 4<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-5\/\"><span class=\"Hyperlink-underscore\">Chapter 5<\/span><\/a>.<\/p>\r\n\r\n<h3 class=\"H1\">People, Samples, and Populations<\/h3>\r\n<p class=\"Text-1st\">Most of what we have dealt with so far has concerned individual scores grouped into samples, with those samples being drawn from and, hopefully, representative of a population. We saw how we can understand the location of individual scores within a sample\u2019s distribution via <span class=\"italic\">z<\/span>\u00a0scores, and how we can extend that to understand how likely it is to observe scores higher or lower than an individual score via probability.<\/p>\r\n<p class=\"Text\">Inherent in this work is the notion that an individual score will differ from the mean, which we quantify as a <span class=\"italic\">z<\/span>\u00a0score. All of the individual scores will differ from the mean in different amounts and different directions, which is natural and expected. We quantify these differences as variance and standard deviation. Measures of spread and the idea of variability in observations is a key principle in inferential statistics. We know that any observation, whether it is a single score, a set of scores, or a particular descriptive statistic will differ from the center of whatever distribution it belongs in.<\/p>\r\n<p class=\"Text\">This is equally true of things outside of statistics and format data collection and analysis. Some days you hear your alarm and wake up easily, but other days you need to hit snooze a few (dozen) times. Some days traffic is light, but other days it is very heavy. Some classes you are able to focus, pay attention, and take good notes, but other days you find yourself zoning out the entire time. Each individual observation is an insight but is not, by itself, the entire story, and it takes an extreme deviation from what we expect for us to think that something strange is going on. Being a little sleepy is normal, but being completely unable to get out of bed might indicate that we are sick. Light traffic is a good thing, but almost no cars on the road might make us think we forgot it is Saturday. Zoning out occasionally is fine, but if we cannot focus at all, we might be in a stats class rather than a fun one.<\/p>\r\n<p class=\"Text\">All of these principles carry forward from scores within samples to samples within populations. Just like an individual score will differ from its mean, an individual sample mean will differ from the true population mean. We encountered this principle in earlier chapters: sampling error. As mentioned way back in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-1\/\"><span class=\"Hyperlink-underscore\">Chapter 1<\/span><\/a>, sampling error is an incredibly important principle. We know ahead of time that if we collect data and compute a sample, the observed value of that sample will be at least slightly off from what we expect it to be based on our supposed population mean; this is natural and expected. However, if our sample mean is extremely different from what we expect based on the population mean, there may be something going on.<\/p>\r\n\r\n<h3 class=\"H1\">The Sampling Distribution of Sample Means<\/h3>\r\n<p class=\"Text-1st\">To see how we use sampling error, we will learn about a new, theoretical distribution known as the sampling distribution. In the same way that we can gather a lot of individual scores and put them together to form a distribution with a center and spread, if we were to take many samples, all of the same size, and calculate the mean of each of those, we could put those means together to form a distribution. This new distribution is, intuitively, known as the [pb_glossary id=\"658\"]<a id=\"_idTextAnchor150\"><\/a>[\/pb_glossary]<span class=\"key-term\">distribution of sample means<\/span>. It is one example of what we call a [pb_glossary id=\"661\"]<a id=\"_idTextAnchor151\"><\/a>[\/pb_glossary]<span class=\"key-term\">sampling distribution<\/span>, which can be formed from a set of any statistic, such as a mean, a test statistic, or a correlation coefficient (more on the latter two in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit\u00a02<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-3-additional-hypothesis-tests\/\"><span class=\"Hyperlink-underscore\">Unit 3<\/span><\/a>). For our purposes, understanding the distribution of sample means will be enough to see how all other sampling distributions work to enable and inform our inferential analyses, so these two terms will be used interchangeably from here on out. Let\u2019s take a deeper look at some of its characteristics.<\/p>\r\n<p class=\"Text\">The sampling distribution of sample means can be described by its shape, center, and spread, just like any of the other distributions we have worked with. The shape of our sampling distribution is normal: a bell-shaped curve with a single peak and two tails extending symmetrically in either direction, just like what we saw in previous chapters. The center of the sampling distribution of sample means\u2014which is, itself, the mean or average of the means\u2014is the true population mean, <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/>. This will sometimes be written as <img class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.1new-2.png\" alt=\"\" \/> to denote it as the mean of the sample means. The spread of the sampling distribution is called the [pb_glossary id=\"662\"]<a id=\"_idTextAnchor152\"><\/a>[\/pb_glossary]<span class=\"key-term\">standard error<\/span>, the quantification of sampling error, denoted <img class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.2-2.png\" alt=\"\" \/>. The formula for standard error is:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-75\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.3-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Notice that the sample size is in this equation. As stated above, the sampling distribution refers to samples of a specific size. That is, all sample means must be calculated from samples of the same size <span class=\"italic\">n<\/span>, such as <span class=\"italic\">n<\/span> = 10, <span class=\"italic\">n<\/span> = 30, or <span class=\"italic\">n<\/span> = 100. This sample size refers to how many people or observations are in each individual sample, <span class=\"italic\">not <\/span>how many samples are used to form the sampling distribution. This is because the sampling distribution is a theoretical distribution, not one we will ever actually calculate or observe. <a href=\"#_idTextAnchor153\"><span class=\"Fig-table-number-underscore\">Figure 6.1<\/span><\/a> displays the principles stated here in graphical form.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer237\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor153\"><\/a>Figure 6.1.<\/span> The sampling distribution of sample means. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/59\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distribution of Sample Means<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer238\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distribution_of_Sample_Means-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h4 class=\"H2\">Two Important Axioms<\/h4>\r\n<p class=\"Text-1st\">We just learned that the sampling distribution is theoretical: we never actually see it. If that is true, then how can we know it works? How can we use something we don\u2019t see? The answer lies in two very important mathematical facts: the central limit theorem and the law of large numbers. We will not go into the math behind how these statements were derived, but knowing what they are and what they mean is important to understanding why inferential statistics work and how we can draw conclusions about a population based on information gained from a single sample.<\/p>\r\n\r\n<h5 class=\"H3\">Central Limit Theorem<\/h5>\r\n<p class=\"Text-1st\">The [pb_glossary id=\"657\"]<a id=\"_idTextAnchor154\"><\/a>[\/pb_glossary]<span class=\"key-term\">central limit theorem<\/span> states:<\/p>\r\n<p class=\"Text-indented-2p\">For samples of a single size <span class=\"italic CharOverride-14\">n<\/span>, drawn from a population with a given mean <img class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/> and variance <span class=\"Symbol\">s<\/span><span class=\"superscript CharOverride-15\">2<\/span>, the sampling distribution of sample means <span class=\"CharOverride-14\">w<\/span>ill h<span class=\"CharOverride-14\">a<\/span>ve a <span class=\"CharOverride-14\">m<\/span>ean <img class=\"_idGenObjectAttribute-76\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.4-2.png\" alt=\"\" \/> and va<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">a<\/span>nce <img class=\"_idGenObjectAttribute-77\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.5-2.png\" alt=\"\" \/>. <span class=\"CharOverride-14\">T<\/span>h<span class=\"CharOverride-14\">i<\/span>s <span class=\"CharOverride-14\">di<\/span>st<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">bu<\/span>t<span class=\"CharOverride-14\">io<\/span>n will approach normality as <span class=\"italic\">n <\/span>increases.<\/p>\r\n<p class=\"Text\">From this, we are able to find the standard deviation of our sampling distribution, the standard error. As you can see, just like any other standard deviation, the standard error is simply the square root of the variance of the distribution.<\/p>\r\n<p class=\"Text\">The last sentence of the central limit theorem states that the sampling distribution will be more normal as the sample size of the samples used to create it increases. What this means is that bigger samples will create a more normal distribution, so we are better able to use the techniques we developed for normal distributions and probabilities. So how large is large enough? In general, a sampling distribution will be normal if either of two characteristics is true: (1) the population from which the samples are drawn is normally distributed or (2) the sample size is equal to or greater than 30. This second criterion is very important because it enables us to use methods developed for normal distributions even if the true population distribution is skewed.<\/p>\r\n\r\n<h5 class=\"H3\">Law of Large Numbers<\/h5>\r\n<p class=\"Text-1st\">The [pb_glossary id=\"659\"]<a id=\"_idTextAnchor155\"><\/a>[\/pb_glossary]<span class=\"key-term\">law of large numbers<\/span> simply states that as our sample size increases, the probability that our sample mean is an accurate representation of the true population mean also increases. It is the formal mathematical way to state that larger samples are more accurate.<\/p>\r\n<p class=\"Text\">The law of large numbers is related to the central limit theorem, specifically the formulas for variance and standard error. Notice that the sample size appears in the denominators of those formulas. A larger denominator in any fraction means that the overall value of the fraction gets smaller (i.e., 1\/2 = 0.50, 1\/3 = 0.33, 1\/4 = 0.25, and so on). Thus, larger sample sizes will create smaller standard errors. We already know that standard error is the spread of the sampling distribution and that a smaller spread creates a narrower distribution. Therefore, larger sample sizes create narrower sampling distributions, which increases the probability that a sample mean will be close to the center and decreases the probability that it will be in the tails. This is illustrated in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a> and <a href=\"#_idTextAnchor157\"><span class=\"Fig-table-number-underscore\">Figure 6.3<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer242\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor156\"><\/a>Figure 6.2.<\/span> Sampling distributions from the same population with <span class=\"Symbol\">m<\/span>\u00a0=\u00a050 and <span class=\"Symbol\">s<\/span> = 10 but different sample sizes (<span class=\"italic\">N <\/span>= 10, <span class=\"italic\">N <\/span>= 30, <span class=\"italic\">N\u00a0<\/span>=\u00a050, <span class=\"italic\">N <\/span>= 100). <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/60\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distributions with Different Sample Sizes<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer243\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distributions_with_Different_Sample_Sizes-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer244\" class=\"Legend-below\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor157\"><\/a>Figure 6.3.<\/span> Relationship between sample size and standard error for a constant <span class=\"Symbol\">s<\/span> = 10. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/61\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between Sample Size and Standard Error<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer245\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_Sample_Size_and_Standard_Error-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<h3 class=\"H1\">Using Standard Error for Probability<\/h3>\r\n<p class=\"Text-1st\">In this chapter, we saw that we can use <span class=\"italic\">z<\/span>\u00a0scores to split up a normal distribution and calculate the proportion of the area under the curve in one of the new regions, giving us the probability of randomly selecting a <span class=\"italic\">z<\/span>\u00a0score in that range. We can follow the exact sample process for sample means, converting them into <span class=\"italic\">z<\/span>\u00a0scores and calculating probabilities. The only difference is that instead of dividing a raw score by the standard deviation, we divide the sample mean by the standard error.<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-78\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.7-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Let\u2019s say we are drawing samples from a population with a mean of 50 and a standard deviation of 10 (the same values used in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a>). What is the probability that we get a random sample of size 10 with a mean greater than or equal to 55? That is, for <span class=\"italic\">n<\/span> = 10, what is the probability that <img class=\"_idGenObjectAttribute-79\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.8-2.png\" alt=\"\" \/>? First, we need to convert this sample mean score into a <span class=\"italic\">z<\/span>\u00a0score:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-80\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.9-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Now we need to shade the area under the normal curve corresponding to scores greater than <span class=\"italic\">z\u00a0<\/span>=\u00a01.58, as in <a href=\"#_idTextAnchor158\"><span class=\"Fig-table-number-underscore\">Figure 6.4<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer249\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor158\"><\/a>Figure 6.4.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 1.58. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/62\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater than z1.58<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer250\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z1.58-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">Now we go to our <span class=\"italic\">z<\/span>\u00a0table and find that the area to the left of <span class=\"italic\">z <\/span>= 1.58 is .9429. Finally, because we need the area to the right (per our shaded diagram), we simply subtract this from 1 to get 1.00\u00a0\u2212\u00a0.9429 = .0571. So, the probability of randomly drawing a sample of 10 people from a population with a mean of 50 and standard deviation of 10 whose sample mean is 55 or more is <span class=\"italic\">p<\/span> =\u00a0.0571, or 5.71%. Notice that we are talking about means that are 55 <span class=\"italic\">or more<\/span>. That is because, strictly speaking, it\u2019s impossible to calculate the probability of a score taking on exactly 1 value since the \u201cshaded region\u201d would just be a line with no area to calculate.<\/p>\r\n<p class=\"Text\">Now let\u2019s do the same thing, but assume that instead of only having a sample of 10 people we took a sample of 50 people. First, we find <span class=\"italic\">z<\/span>:<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-81\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.10-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Then we shade the appropriate region of the normal distribution, as shown in <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a>.<\/p>\r\n\r\n<div class=\"_idGenObjectLayout-2\">\r\n<div id=\"_idContainer252\" class=\"Side-legend\">\r\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor159\"><\/a>Figure 6.5.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 3.55. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/63\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater Than z3.55<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"_idGenObjectLayout-1\">\r\n<div id=\"_idContainer253\" class=\"_idGenObjectStyleOverride-1\"><img class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z3.55-2.png\" alt=\"\" \/><\/div>\r\n<\/div>\r\n<p class=\"Text\">Notice that no region of <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a> appears to be shaded. That is because the area under the curve that far out into the tail is so small that it can\u2019t even be seen (the red line has been added to show exactly where the region starts). Thus, we already know that the probability must be smaller for <span class=\"italic\">N\u00a0<\/span>=\u00a050 than <span class=\"italic\">N <\/span>= 10 because the size of the area (the proportion) is much smaller.<\/p>\r\n<p class=\"Text\">The table only goes up to 4.00 because everything beyond that is almost 0 and changes so little that it\u2019s not worth printing values. The closest we can get is subtracting the largest value, .9990, from 1 to get .001. We know that, technically, the actual probability is smaller, so we say that the probability is <span class=\"italic\">p<\/span> &lt; .001, or less than 0.1%.<\/p>\r\n<p class=\"Text\">This example shows what an impact sample size can have. From the same population, looking for exactly the same thing, changing only the sample size took us from roughly a 5% chance (or about 1\/20 odds) to a less than 0.1% chance (or less than 1 in 1000). As the sample size <span class=\"italic\">n <\/span>increased, the standard error decreased, which in turn caused the value of <span class=\"italic\">z <\/span>to increase, which finally caused the <span class=\"italic\">p<\/span> value (a term for probability we will use a lot in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>) to decrease. You can think of this relationship like gears: turning the first gear (sample size) clockwise causes the next gear (standard error) to turn counterclockwise, which causes the third gear (<span class=\"italic\">z<\/span>) to turn clockwise, which finally causes the last gear (probability) to turn counterclockwise. All of these pieces fit together, and the relationships will always be the same: As Sample size (n) increases, the Standard Error decreases. As Z increases, the probability decreases.<\/p>\r\n<p class=\"Equation\"><img class=\"_idGenObjectAttribute-82\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.11-2.png\" alt=\"\" \/><\/p>\r\n<p class=\"Text\">Because the sample mean will naturally move around due to sampling error, our observed effect will also change naturally. In the context of our formula for <span class=\"italic\">z<\/span>, then, our standard error is how much we would naturally expect the observed effect to change. Changing by a little is completely normal, but changing by a lot might indicate something is going on. This is the basis of inferential statistics and the logic behind hypothesis testing, the subject of <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>.<\/p>\r\n<strong data-start=\"3387\" data-end=\"3399\">Example:<\/strong><br data-start=\"3399\" data-end=\"3402\" \/>Let\u2019s say we survey LGBTQ+ students about campus belonging. If we take a small sample, the probability of finding a mean response far below the population average is higher just due to sampling error. But with larger samples, the probability of such an extreme result shrinks \u2014 meaning that when we do observe very low belonging scores, it is more likely to reflect a real problem on campus rather than just chance variation. Standard error helps us interpret these probabilities in context.\r\n<h3 class=\"H1\">Exercises<\/h3>\r\n<ol>\r\n \t<li class=\"Numbered-list-Exercises-1st\">What is a sampling distribution?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What are the two mathematical facts that describe how sampling distributions work?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What is the difference between a sampling distribution and a regular distribution?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What effect does sample size have on the shape of a sampling distribution?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">What is standard error?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">For a population with a mean of 75 and a standard deviation of 12, what proportion of sample means of size <span class=\"italic\">n<\/span> = 16 fall above 82?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">For a population with a mean of 100 and standard deviation of 16, what is the probability that a random sample of size 4 will have a mean between 110 and 130?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">Find the <span class=\"italic\">z<\/span>\u00a0score for the following means taken from a population with mean 10 and standard deviation 2:\r\n<ol>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 8, <span class=\"italic\">n<\/span> = 12<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 8, <span class=\"italic\">n<\/span> = 30<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 20, <span class=\"italic\">n<\/span> = 4<\/li>\r\n \t<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 20, <span class=\"italic\">n<\/span> = 16<\/li>\r\n<\/ol>\r\n<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">As the sample size increases, what happens to the <span class=\"italic\">p<\/span> value associated with a given sample mean?<\/li>\r\n \t<li class=\"Numbered-list-Exercises\">For a population with a mean of 35 and standard deviation of 7, find the sample mean of size <span class=\"italic\">n<\/span>\u00a0=\u00a020 that cuts off the top 5% of the sampling distribution.<\/li>\r\n \t<li><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">A researcher is interested in estimating the average age when people had their first adult interaction with law enforcement. <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">Taking a random sample of 25 adults, she determines a sample mean of 20 years and <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">a sample standard deviation of 1.5 years. Construct a 95% confidence interval to <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">estimate the population mean age when adults first interaction with law enforcement occurred. Write a concluding <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">statement.<\/span><\/li>\r\n<\/ol>\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h3 class=\"H1\">Answers to Odd-Numbered Exercises<\/h3>\r\n<\/header>\r\n<p class=\"textbox__content\">1) The sampling distribution (or sampling distribution of the sample means) is the distribution formed by combining many sample means taken from the same population and of a single, consistent sample size.<\/p>\r\n<span style=\"text-align: initial;font-size: 1em\">3) A sampling distribution is made of statistics (e.g., the mean), whereas a regular distribution is made of individual scores.<\/span>\r\n\r\n5) Standard error is the spread of the sampling distribution and is the quantification of sampling error. It is how much we expect the sample mean to naturally change based on random chance.\r\n\r\n7) 10.46% or .10469) As sample size increases, the p value will decrease.\r\n\r\n11) We are 95% certain that the population mean of the average age when people first interaction with law enforcement occurred is between 19.37 and 20.63.\r\n\r\n<\/div>","rendered":"<div class=\"textbox textbox--sidebar textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h2 class=\"Chapter-title\">Key Terms<\/h2>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor154\"><span class=\"Hyperlink-underscore\">central limit theorem<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor150\"><span class=\"Hyperlink-underscore\">distribution of sample means<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor155\"><span class=\"Hyperlink-underscore\">law of large numbers<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor161\"><span class=\"Hyperlink-underscore\">observed effect<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor151\"><span class=\"Hyperlink-underscore\">sampling distribution<\/span><\/a><\/p>\n<p class=\"Key-terms\"><a href=\"#_idTextAnchor152\"><span class=\"Hyperlink-underscore\">standard error<\/span><\/a><\/p>\n<\/div>\n<\/div>\n<p class=\"Text-1st\">We have come to the final chapter in this unit. We will now take the logic, ideas, and techniques we have developed and put them together to see how we can take a sample of data and use it to make inferences about what\u2019s truly happening in the broader population. This is the final piece of the puzzle that we need to understand in order to have the groundwork necessary for formal hypothesis testing. Though some of the concepts in this chapter seem strange, they are all simple extensions of what we have already learned in previous chapters, especially <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-4\/\"><span class=\"Hyperlink-underscore\">Chapter 4<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-5\/\"><span class=\"Hyperlink-underscore\">Chapter 5<\/span><\/a>.<\/p>\n<p>When we move from describing data to making inferences, we are taking the step that allows us to say something larger about the world. In social justice work, this is essential: we rarely have access to everyone in a population, so we rely on samples to draw conclusions about inequality, discrimination, or opportunity. For example, a survey of student experiences with campus safety might only include a few hundred respondents, yet the results inform policy for thousands. Inferential statistics give us the tools to distinguish between natural variation in samples and real evidence of systemic disparities.<\/p>\n<h3 class=\"H1\">People, Samples, and Populations<\/h3>\n<p class=\"Text-1st\">Most of what we have dealt with so far has concerned individual scores grouped into samples, with those samples being drawn from and, hopefully, representative of a population. We saw how we can understand the location of individual scores within a sample\u2019s distribution via <span class=\"italic\">z<\/span>\u00a0scores, and how we can extend that to understand how likely it is to observe scores higher or lower than an individual score via probability.<\/p>\n<p class=\"Text\">Inherent in this work is the notion that an individual score will differ from the mean, which we quantify as a <span class=\"italic\">z<\/span>\u00a0score. All of the individual scores will differ from the mean in different amounts and different directions, which is natural and expected. We quantify these differences as variance and standard deviation. Measures of spread and the idea of variability in observations is a key principle in inferential statistics. We know that any observation, whether it is a single score, a set of scores, or a particular descriptive statistic will differ from the center of whatever distribution it belongs in.<\/p>\n<p class=\"Text\">This is equally true of things outside of statistics and format data collection and analysis. Some days you hear your alarm and wake up easily, but other days you need to hit snooze a few (dozen) times. Some days traffic is light, but other days it is very heavy. Some classes you are able to focus, pay attention, and take good notes, but other days you find yourself zoning out the entire time. Each individual observation is an insight but is not, by itself, the entire story, and it takes an extreme deviation from what we expect for us to think that something strange is going on. Being a little sleepy is normal, but being completely unable to get out of bed might indicate that we are sick. Light traffic is a good thing, but almost no cars on the road might make us think we forgot it is Saturday. Zoning out occasionally is fine, but if we cannot focus at all, we might be in a stats class rather than a fun one.<\/p>\n<p><strong data-start=\"964\" data-end=\"976\">Example:<\/strong><br data-start=\"976\" data-end=\"979\" \/>Imagine surveying wages for a small group of Latina women in a city. Their sample mean may not exactly match the true average wages for all Latina women in that city. Some deviation is expected \u2014 this is sampling error. But if the sample mean is far lower than expected, it may indicate structural wage inequality. Understanding that \u201cextreme deviations\u201d are meaningful is the first step toward evidence-based advocacy.<\/p>\n<p class=\"Text\">All of these principles carry forward from scores within samples to samples within populations. Just like an individual score will differ from its mean, an individual sample mean will differ from the true population mean. We encountered this principle in earlier chapters: sampling error. As mentioned way back in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-1\/\"><span class=\"Hyperlink-underscore\">Chapter 1<\/span><\/a>, sampling error is an incredibly important principle. We know ahead of time that if we collect data and compute a sample, the observed value of that sample will be at least slightly off from what we expect it to be based on our supposed population mean; this is natural and expected. However, if our sample mean is extremely different from what we expect based on the population mean, there may be something going on.<\/p>\n<h3 class=\"H1\">The Sampling Distribution of Sample Means<\/h3>\n<p class=\"Text-1st\">To see how we use sampling error, we will learn about a new, theoretical distribution known as the sampling distribution. In the same way that we can gather a lot of individual scores and put them together to form a distribution with a center and spread, if we were to take many samples, all of the same size, and calculate the mean of each of those, we could put those means together to form a distribution. This new distribution is, intuitively, known as the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_658\"><a id=\"_idTextAnchor150\"><\/a><\/a><span class=\"key-term\">distribution of sample means<\/span>. It is one example of what we call a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_661\"><a id=\"_idTextAnchor151\"><\/a><\/a><span class=\"key-term\">sampling distribution<\/span>, which can be formed from a set of any statistic, such as a mean, a test statistic, or a correlation coefficient (more on the latter two in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit\u00a02<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-3-additional-hypothesis-tests\/\"><span class=\"Hyperlink-underscore\">Unit 3<\/span><\/a>). For our purposes, understanding the distribution of sample means will be enough to see how all other sampling distributions work to enable and inform our inferential analyses, so these two terms will be used interchangeably from here on out. Let\u2019s take a deeper look at some of its characteristics.<\/p>\n<p>Example:<br data-start=\"1543\" data-end=\"1546\" \/>Consider studying voter turnout. If we draw repeated samples of 200 people in a district, each sample mean for \u201cpercentage who voted\u201d will vary. When we put those together into a sampling distribution, the center will align with the true turnout rate. From a social justice perspective, this helps us see whether turnout differences across groups (for example, between younger voters and older voters) reflect chance or real barriers, such as voter ID laws or limited polling access.<\/p>\n<p class=\"Text\">The sampling distribution of sample means can be described by its shape, center, and spread, just like any of the other distributions we have worked with. The shape of our sampling distribution is normal: a bell-shaped curve with a single peak and two tails extending symmetrically in either direction, just like what we saw in previous chapters. The center of the sampling distribution of sample means\u2014which is, itself, the mean or average of the means\u2014is the true population mean, <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/>. This will sometimes be written as <img decoding=\"async\" class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.1new-2.png\" alt=\"\" \/> to denote it as the mean of the sample means. The spread of the sampling distribution is called the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_662\"><a id=\"_idTextAnchor152\"><\/a><\/a><span class=\"key-term\">standard error<\/span>, the quantification of sampling error, denoted <img decoding=\"async\" class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.2-2.png\" alt=\"\" \/>. The formula for standard error is:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-75\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.3-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Notice that the sample size is in this equation. As stated above, the sampling distribution refers to samples of a specific size. That is, all sample means must be calculated from samples of the same size <span class=\"italic\">n<\/span>, such as <span class=\"italic\">n<\/span> = 10, <span class=\"italic\">n<\/span> = 30, or <span class=\"italic\">n<\/span> = 100. This sample size refers to how many people or observations are in each individual sample, <span class=\"italic\">not <\/span>how many samples are used to form the sampling distribution. This is because the sampling distribution is a theoretical distribution, not one we will ever actually calculate or observe. <a href=\"#_idTextAnchor153\"><span class=\"Fig-table-number-underscore\">Figure 6.1<\/span><\/a> displays the principles stated here in graphical form.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer237\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor153\"><\/a>Figure 6.1.<\/span> The sampling distribution of sample means. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/59\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distribution of Sample Means<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer238\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distribution_of_Sample_Means-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h4 class=\"H2\">Two Important Axioms<\/h4>\n<p class=\"Text-1st\">We just learned that the sampling distribution is theoretical: we never actually see it. If that is true, then how can we know it works? How can we use something we don\u2019t see? The answer lies in two very important mathematical facts: the central limit theorem and the law of large numbers. We will not go into the math behind how these statements were derived, but knowing what they are and what they mean is important to understanding why inferential statistics work and how we can draw conclusions about a population based on information gained from a single sample.<\/p>\n<h5 class=\"H3\">Central Limit Theorem<\/h5>\n<p class=\"Text-1st\">The <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_657\"><a id=\"_idTextAnchor154\"><\/a><\/a><span class=\"key-term\">central limit theorem<\/span> states:<\/p>\n<p class=\"Text-indented-2p\">For samples of a single size <span class=\"italic CharOverride-14\">n<\/span>, drawn from a population with a given mean <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/> and variance <span class=\"Symbol\">s<\/span><span class=\"superscript CharOverride-15\">2<\/span>, the sampling distribution of sample means <span class=\"CharOverride-14\">w<\/span>ill h<span class=\"CharOverride-14\">a<\/span>ve a <span class=\"CharOverride-14\">m<\/span>ean <img decoding=\"async\" class=\"_idGenObjectAttribute-76\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.4-2.png\" alt=\"\" \/> and va<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">a<\/span>nce <img decoding=\"async\" class=\"_idGenObjectAttribute-77\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.5-2.png\" alt=\"\" \/>. <span class=\"CharOverride-14\">T<\/span>h<span class=\"CharOverride-14\">i<\/span>s <span class=\"CharOverride-14\">di<\/span>st<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">bu<\/span>t<span class=\"CharOverride-14\">io<\/span>n will approach normality as <span class=\"italic\">n <\/span>increases.<\/p>\n<p class=\"Text\">From this, we are able to find the standard deviation of our sampling distribution, the standard error. As you can see, just like any other standard deviation, the standard error is simply the square root of the variance of the distribution.<\/p>\n<p class=\"Text\">The last sentence of the central limit theorem states that the sampling distribution will be more normal as the sample size of the samples used to create it increases. What this means is that bigger samples will create a more normal distribution, so we are better able to use the techniques we developed for normal distributions and probabilities. So how large is large enough? In general, a sampling distribution will be normal if either of two characteristics is true: (1) the population from which the samples are drawn is normally distributed or (2) the sample size is equal to or greater than 30. This second criterion is very important because it enables us to use methods developed for normal distributions even if the true population distribution is skewed.<\/p>\n<p><strong data-start=\"2139\" data-end=\"2151\">Example:<\/strong><br data-start=\"2151\" data-end=\"2154\" \/>Suppose we want to know if Black homeowners are systematically offered higher mortgage interest rates than white homeowners. If we only take a very small sample of loans, results might be noisy and inconclusive. But the central limit theorem assures us that larger, repeated samples will produce a more normal sampling distribution, making it possible to test whether observed differences are the result of chance or discrimination. In practice, this is exactly how researchers have uncovered patterns of systemic bias in lending.<\/p>\n<h5 class=\"H3\">Law of Large Numbers<\/h5>\n<p class=\"Text-1st\">The <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_659\"><a id=\"_idTextAnchor155\"><\/a><\/a><span class=\"key-term\">law of large numbers<\/span> simply states that as our sample size increases, the probability that our sample mean is an accurate representation of the true population mean also increases. It is the formal mathematical way to state that larger samples are more accurate.<\/p>\n<p><strong data-start=\"2817\" data-end=\"2829\">Example:<\/strong><br data-start=\"2829\" data-end=\"2832\" \/>Public health researchers studying environmental justice may want to know if living near industrial sites is linked to higher asthma rates. A small survey might produce unstable results. But as sample size grows, the law of large numbers tells us that the sample mean rate of asthma will approach the true population rate. This makes findings more reliable and strengthens arguments for policy change.<\/p>\n<p class=\"Text\">The law of large numbers is related to the central limit theorem, specifically the formulas for variance and standard error. Notice that the sample size appears in the denominators of those formulas. A larger denominator in any fraction means that the overall value of the fraction gets smaller (i.e., 1\/2 = 0.50, 1\/3 = 0.33, 1\/4 = 0.25, and so on). Thus, larger sample sizes will create smaller standard errors. We already know that standard error is the spread of the sampling distribution and that a smaller spread creates a narrower distribution. Therefore, larger sample sizes create narrower sampling distributions, which increases the probability that a sample mean will be close to the center and decreases the probability that it will be in the tails. This is illustrated in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a> and <a href=\"#_idTextAnchor157\"><span class=\"Fig-table-number-underscore\">Figure 6.3<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer242\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor156\"><\/a>Figure 6.2.<\/span> Sampling distributions from the same population with <span class=\"Symbol\">m<\/span>\u00a0=\u00a050 and <span class=\"Symbol\">s<\/span> = 10 but different sample sizes (<span class=\"italic\">N <\/span>= 10, <span class=\"italic\">N <\/span>= 30, <span class=\"italic\">N\u00a0<\/span>=\u00a050, <span class=\"italic\">N <\/span>= 100). <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/60\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distributions with Different Sample Sizes<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer243\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distributions_with_Different_Sample_Sizes-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer244\" class=\"Legend-below\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor157\"><\/a>Figure 6.3.<\/span> Relationship between sample size and standard error for a constant <span class=\"Symbol\">s<\/span> = 10. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/61\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between Sample Size and Standard Error<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer245\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_Sample_Size_and_Standard_Error-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h3 class=\"H1\">Using Standard Error for Probability<\/h3>\n<p class=\"Text-1st\">In this chapter, we saw that we can use <span class=\"italic\">z<\/span>\u00a0scores to split up a normal distribution and calculate the proportion of the area under the curve in one of the new regions, giving us the probability of randomly selecting a <span class=\"italic\">z<\/span>\u00a0score in that range. We can follow the exact sample process for sample means, converting them into <span class=\"italic\">z<\/span>\u00a0scores and calculating probabilities. The only difference is that instead of dividing a raw score by the standard deviation, we divide the sample mean by the standard error.<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-78\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.7-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Let\u2019s say we are drawing samples from a population with a mean of 50 and a standard deviation of 10 (the same values used in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a>). What is the probability that we get a random sample of size 10 with a mean greater than or equal to 55? That is, for <span class=\"italic\">n<\/span> = 10, what is the probability that <img decoding=\"async\" class=\"_idGenObjectAttribute-79\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.8-2.png\" alt=\"\" \/>? First, we need to convert this sample mean score into a <span class=\"italic\">z<\/span>\u00a0score:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-80\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.9-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Now we need to shade the area under the normal curve corresponding to scores greater than <span class=\"italic\">z\u00a0<\/span>=\u00a01.58, as in <a href=\"#_idTextAnchor158\"><span class=\"Fig-table-number-underscore\">Figure 6.4<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer249\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor158\"><\/a>Figure 6.4.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 1.58. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/62\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater than z1.58<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer250\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z1.58-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">Now we go to our <span class=\"italic\">z<\/span>\u00a0table and find that the area to the left of <span class=\"italic\">z <\/span>= 1.58 is .9429. Finally, because we need the area to the right (per our shaded diagram), we simply subtract this from 1 to get 1.00\u00a0\u2212\u00a0.9429 = .0571. So, the probability of randomly drawing a sample of 10 people from a population with a mean of 50 and standard deviation of 10 whose sample mean is 55 or more is <span class=\"italic\">p<\/span> =\u00a0.0571, or 5.71%. Notice that we are talking about means that are 55 <span class=\"italic\">or more<\/span>. That is because, strictly speaking, it\u2019s impossible to calculate the probability of a score taking on exactly 1 value since the \u201cshaded region\u201d would just be a line with no area to calculate.<\/p>\n<p class=\"Text\">Now let\u2019s do the same thing, but assume that instead of only having a sample of 10 people we took a sample of 50 people. First, we find <span class=\"italic\">z<\/span>:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-81\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.10-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Then we shade the appropriate region of the normal distribution, as shown in <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer252\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor159\"><\/a>Figure 6.5.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 3.55. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/63\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater Than z3.55<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer253\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z3.55-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">Notice that no region of <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a> appears to be shaded. That is because the area under the curve that far out into the tail is so small that it can\u2019t even be seen (the red line has been added to show exactly where the region starts). Thus, we already know that the probability must be smaller for <span class=\"italic\">N\u00a0<\/span>=\u00a050 than <span class=\"italic\">N <\/span>= 10 because the size of the area (the proportion) is much smaller.<\/p>\n<p class=\"Text\">The table only goes up to 4.00 because everything beyond that is almost 0 and changes so little that it\u2019s not worth printing values. The closest we can get is subtracting the largest value, .9990, from 1 to get .001. We know that, technically, the actual probability is smaller, so we say that the probability is <span class=\"italic\">p<\/span> &lt; .001, or less than 0.1%.<\/p>\n<p class=\"Text\">This example shows what an impact sample size can have. From the same population, looking for exactly the same thing, changing only the sample size took us from roughly a 5% chance (or about 1\/20 odds) to a less than 0.1% chance (or less than 1 in 1000). As the sample size <span class=\"italic\">n <\/span>increased, the standard error decreased, which in turn caused the value of <span class=\"italic\">z <\/span>to increase, which finally caused the <span class=\"italic\">p<\/span> value (a term for probability we will use a lot in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>) to decrease. You can think of this relationship like gears: turning the first gear (sample size) clockwise causes the next gear (standard error) to turn counterclockwise, which causes the third gear (<span class=\"italic\">z<\/span>) to turn clockwise, which finally causes the last gear (probability) to turn counterclockwise. All of these pieces fit together, and the relationships will always be the same: As Sample size (n) increases, the Standard Error decreases. As Z increases, the probability decreases.<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-82\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.11-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Because the sample mean will naturally move around due to sampling error, our observed effect will also change naturally. In the context of our formula for <span class=\"italic\">z<\/span>, then, our standard error is how much we would naturally expect the observed effect to change. Changing by a little is completely normal, but changing by a lot might indicate something is going on. This is the basis of inferential statistics and the logic behind hypothesis testing, the subject of <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>.<\/p>\n<p class=\"Text-1st\">We have come to the final chapter in this unit. We will now take the logic, ideas, and techniques we have developed and put them together to see how we can take a sample of data and use it to make inferences about what\u2019s truly happening in the broader population. This is the final piece of the puzzle that we need to understand in order to have the groundwork necessary for formal hypothesis testing. Though some of the concepts in this chapter seem strange, they are all simple extensions of what we have already learned in previous chapters, especially <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-4\/\"><span class=\"Hyperlink-underscore\">Chapter 4<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-5\/\"><span class=\"Hyperlink-underscore\">Chapter 5<\/span><\/a>.<\/p>\n<h3 class=\"H1\">People, Samples, and Populations<\/h3>\n<p class=\"Text-1st\">Most of what we have dealt with so far has concerned individual scores grouped into samples, with those samples being drawn from and, hopefully, representative of a population. We saw how we can understand the location of individual scores within a sample\u2019s distribution via <span class=\"italic\">z<\/span>\u00a0scores, and how we can extend that to understand how likely it is to observe scores higher or lower than an individual score via probability.<\/p>\n<p class=\"Text\">Inherent in this work is the notion that an individual score will differ from the mean, which we quantify as a <span class=\"italic\">z<\/span>\u00a0score. All of the individual scores will differ from the mean in different amounts and different directions, which is natural and expected. We quantify these differences as variance and standard deviation. Measures of spread and the idea of variability in observations is a key principle in inferential statistics. We know that any observation, whether it is a single score, a set of scores, or a particular descriptive statistic will differ from the center of whatever distribution it belongs in.<\/p>\n<p class=\"Text\">This is equally true of things outside of statistics and format data collection and analysis. Some days you hear your alarm and wake up easily, but other days you need to hit snooze a few (dozen) times. Some days traffic is light, but other days it is very heavy. Some classes you are able to focus, pay attention, and take good notes, but other days you find yourself zoning out the entire time. Each individual observation is an insight but is not, by itself, the entire story, and it takes an extreme deviation from what we expect for us to think that something strange is going on. Being a little sleepy is normal, but being completely unable to get out of bed might indicate that we are sick. Light traffic is a good thing, but almost no cars on the road might make us think we forgot it is Saturday. Zoning out occasionally is fine, but if we cannot focus at all, we might be in a stats class rather than a fun one.<\/p>\n<p class=\"Text\">All of these principles carry forward from scores within samples to samples within populations. Just like an individual score will differ from its mean, an individual sample mean will differ from the true population mean. We encountered this principle in earlier chapters: sampling error. As mentioned way back in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/chapter\/chapter-1\/\"><span class=\"Hyperlink-underscore\">Chapter 1<\/span><\/a>, sampling error is an incredibly important principle. We know ahead of time that if we collect data and compute a sample, the observed value of that sample will be at least slightly off from what we expect it to be based on our supposed population mean; this is natural and expected. However, if our sample mean is extremely different from what we expect based on the population mean, there may be something going on.<\/p>\n<h3 class=\"H1\">The Sampling Distribution of Sample Means<\/h3>\n<p class=\"Text-1st\">To see how we use sampling error, we will learn about a new, theoretical distribution known as the sampling distribution. In the same way that we can gather a lot of individual scores and put them together to form a distribution with a center and spread, if we were to take many samples, all of the same size, and calculate the mean of each of those, we could put those means together to form a distribution. This new distribution is, intuitively, known as the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_658\"><a id=\"_idTextAnchor150\"><\/a><\/a><span class=\"key-term\">distribution of sample means<\/span>. It is one example of what we call a <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_661\"><a id=\"_idTextAnchor151\"><\/a><\/a><span class=\"key-term\">sampling distribution<\/span>, which can be formed from a set of any statistic, such as a mean, a test statistic, or a correlation coefficient (more on the latter two in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit\u00a02<\/span><\/a> and <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-3-additional-hypothesis-tests\/\"><span class=\"Hyperlink-underscore\">Unit 3<\/span><\/a>). For our purposes, understanding the distribution of sample means will be enough to see how all other sampling distributions work to enable and inform our inferential analyses, so these two terms will be used interchangeably from here on out. Let\u2019s take a deeper look at some of its characteristics.<\/p>\n<p class=\"Text\">The sampling distribution of sample means can be described by its shape, center, and spread, just like any of the other distributions we have worked with. The shape of our sampling distribution is normal: a bell-shaped curve with a single peak and two tails extending symmetrically in either direction, just like what we saw in previous chapters. The center of the sampling distribution of sample means\u2014which is, itself, the mean or average of the means\u2014is the true population mean, <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/>. This will sometimes be written as <img decoding=\"async\" class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.1new-2.png\" alt=\"\" \/> to denote it as the mean of the sample means. The spread of the sampling distribution is called the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_662\"><a id=\"_idTextAnchor152\"><\/a><\/a><span class=\"key-term\">standard error<\/span>, the quantification of sampling error, denoted <img decoding=\"async\" class=\"_idGenObjectAttribute-74\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.2-2.png\" alt=\"\" \/>. The formula for standard error is:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-75\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn6.3-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Notice that the sample size is in this equation. As stated above, the sampling distribution refers to samples of a specific size. That is, all sample means must be calculated from samples of the same size <span class=\"italic\">n<\/span>, such as <span class=\"italic\">n<\/span> = 10, <span class=\"italic\">n<\/span> = 30, or <span class=\"italic\">n<\/span> = 100. This sample size refers to how many people or observations are in each individual sample, <span class=\"italic\">not <\/span>how many samples are used to form the sampling distribution. This is because the sampling distribution is a theoretical distribution, not one we will ever actually calculate or observe. <a href=\"#_idTextAnchor153\"><span class=\"Fig-table-number-underscore\">Figure 6.1<\/span><\/a> displays the principles stated here in graphical form.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer237\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor153\"><\/a>Figure 6.1.<\/span> The sampling distribution of sample means. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/59\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distribution of Sample Means<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer238\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distribution_of_Sample_Means-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h4 class=\"H2\">Two Important Axioms<\/h4>\n<p class=\"Text-1st\">We just learned that the sampling distribution is theoretical: we never actually see it. If that is true, then how can we know it works? How can we use something we don\u2019t see? The answer lies in two very important mathematical facts: the central limit theorem and the law of large numbers. We will not go into the math behind how these statements were derived, but knowing what they are and what they mean is important to understanding why inferential statistics work and how we can draw conclusions about a population based on information gained from a single sample.<\/p>\n<h5 class=\"H3\">Central Limit Theorem<\/h5>\n<p class=\"Text-1st\">The <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_657\"><a id=\"_idTextAnchor154\"><\/a><\/a><span class=\"key-term\">central limit theorem<\/span> states:<\/p>\n<p class=\"Text-indented-2p\">For samples of a single size <span class=\"italic CharOverride-14\">n<\/span>, drawn from a population with a given mean <img decoding=\"async\" class=\"_idGenObjectAttribute-31\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2021\/12\/Eqn2.14-mu-4.png\" alt=\"mu\" \/> and variance <span class=\"Symbol\">s<\/span><span class=\"superscript CharOverride-15\">2<\/span>, the sampling distribution of sample means <span class=\"CharOverride-14\">w<\/span>ill h<span class=\"CharOverride-14\">a<\/span>ve a <span class=\"CharOverride-14\">m<\/span>ean <img decoding=\"async\" class=\"_idGenObjectAttribute-76\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.4-2.png\" alt=\"\" \/> and va<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">a<\/span>nce <img decoding=\"async\" class=\"_idGenObjectAttribute-77\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.5-2.png\" alt=\"\" \/>. <span class=\"CharOverride-14\">T<\/span>h<span class=\"CharOverride-14\">i<\/span>s <span class=\"CharOverride-14\">di<\/span>st<span class=\"CharOverride-14\">r<\/span>i<span class=\"CharOverride-14\">bu<\/span>t<span class=\"CharOverride-14\">io<\/span>n will approach normality as <span class=\"italic\">n <\/span>increases.<\/p>\n<p class=\"Text\">From this, we are able to find the standard deviation of our sampling distribution, the standard error. As you can see, just like any other standard deviation, the standard error is simply the square root of the variance of the distribution.<\/p>\n<p class=\"Text\">The last sentence of the central limit theorem states that the sampling distribution will be more normal as the sample size of the samples used to create it increases. What this means is that bigger samples will create a more normal distribution, so we are better able to use the techniques we developed for normal distributions and probabilities. So how large is large enough? In general, a sampling distribution will be normal if either of two characteristics is true: (1) the population from which the samples are drawn is normally distributed or (2) the sample size is equal to or greater than 30. This second criterion is very important because it enables us to use methods developed for normal distributions even if the true population distribution is skewed.<\/p>\n<h5 class=\"H3\">Law of Large Numbers<\/h5>\n<p class=\"Text-1st\">The <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_185_659\"><a id=\"_idTextAnchor155\"><\/a><\/a><span class=\"key-term\">law of large numbers<\/span> simply states that as our sample size increases, the probability that our sample mean is an accurate representation of the true population mean also increases. It is the formal mathematical way to state that larger samples are more accurate.<\/p>\n<p class=\"Text\">The law of large numbers is related to the central limit theorem, specifically the formulas for variance and standard error. Notice that the sample size appears in the denominators of those formulas. A larger denominator in any fraction means that the overall value of the fraction gets smaller (i.e., 1\/2 = 0.50, 1\/3 = 0.33, 1\/4 = 0.25, and so on). Thus, larger sample sizes will create smaller standard errors. We already know that standard error is the spread of the sampling distribution and that a smaller spread creates a narrower distribution. Therefore, larger sample sizes create narrower sampling distributions, which increases the probability that a sample mean will be close to the center and decreases the probability that it will be in the tails. This is illustrated in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a> and <a href=\"#_idTextAnchor157\"><span class=\"Fig-table-number-underscore\">Figure 6.3<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer242\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor156\"><\/a>Figure 6.2.<\/span> Sampling distributions from the same population with <span class=\"Symbol\">m<\/span>\u00a0=\u00a050 and <span class=\"Symbol\">s<\/span> = 10 but different sample sizes (<span class=\"italic\">N <\/span>= 10, <span class=\"italic\">N <\/span>= 30, <span class=\"italic\">N\u00a0<\/span>=\u00a050, <span class=\"italic\">N <\/span>= 100). <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/60\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Sampling Distributions with Different Sample Sizes<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer243\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Sampling_Distributions_with_Different_Sample_Sizes-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer244\" class=\"Legend-below\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor157\"><\/a>Figure 6.3.<\/span> Relationship between sample size and standard error for a constant <span class=\"Symbol\">s<\/span> = 10. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/61\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Relationship between Sample Size and Standard Error<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer245\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Relationship_between_Sample_Size_and_Standard_Error-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<h3 class=\"H1\">Using Standard Error for Probability<\/h3>\n<p class=\"Text-1st\">In this chapter, we saw that we can use <span class=\"italic\">z<\/span>\u00a0scores to split up a normal distribution and calculate the proportion of the area under the curve in one of the new regions, giving us the probability of randomly selecting a <span class=\"italic\">z<\/span>\u00a0score in that range. We can follow the exact sample process for sample means, converting them into <span class=\"italic\">z<\/span>\u00a0scores and calculating probabilities. The only difference is that instead of dividing a raw score by the standard deviation, we divide the sample mean by the standard error.<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-78\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.7-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Let\u2019s say we are drawing samples from a population with a mean of 50 and a standard deviation of 10 (the same values used in <a href=\"#_idTextAnchor156\"><span class=\"Fig-table-number-underscore\">Figure 6.2<\/span><\/a>). What is the probability that we get a random sample of size 10 with a mean greater than or equal to 55? That is, for <span class=\"italic\">n<\/span> = 10, what is the probability that <img decoding=\"async\" class=\"_idGenObjectAttribute-79\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.8-2.png\" alt=\"\" \/>? First, we need to convert this sample mean score into a <span class=\"italic\">z<\/span>\u00a0score:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-80\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.9-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Now we need to shade the area under the normal curve corresponding to scores greater than <span class=\"italic\">z\u00a0<\/span>=\u00a01.58, as in <a href=\"#_idTextAnchor158\"><span class=\"Fig-table-number-underscore\">Figure 6.4<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer249\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor158\"><\/a>Figure 6.4.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 1.58. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/62\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater than z1.58<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer250\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z1.58-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">Now we go to our <span class=\"italic\">z<\/span>\u00a0table and find that the area to the left of <span class=\"italic\">z <\/span>= 1.58 is .9429. Finally, because we need the area to the right (per our shaded diagram), we simply subtract this from 1 to get 1.00\u00a0\u2212\u00a0.9429 = .0571. So, the probability of randomly drawing a sample of 10 people from a population with a mean of 50 and standard deviation of 10 whose sample mean is 55 or more is <span class=\"italic\">p<\/span> =\u00a0.0571, or 5.71%. Notice that we are talking about means that are 55 <span class=\"italic\">or more<\/span>. That is because, strictly speaking, it\u2019s impossible to calculate the probability of a score taking on exactly 1 value since the \u201cshaded region\u201d would just be a line with no area to calculate.<\/p>\n<p class=\"Text\">Now let\u2019s do the same thing, but assume that instead of only having a sample of 10 people we took a sample of 50 people. First, we find <span class=\"italic\">z<\/span>:<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-81\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.10-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Then we shade the appropriate region of the normal distribution, as shown in <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a>.<\/p>\n<div class=\"_idGenObjectLayout-2\">\n<div id=\"_idContainer252\" class=\"Side-legend\">\n<p class=\"Fig-legend\"><span class=\"Fig-table-number\"><a id=\"_idTextAnchor159\"><\/a>Figure 6.5.<\/span> Area under the curve greater than <span class=\"italic\">z <\/span>= 3.55. <span class=\"Fig-source\">(\u201c<\/span><a href=\"https:\/\/irl.umsl.edu\/oer-img\/63\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">Area under the Curve Greater Than z3.55<\/span><\/span><\/a><span class=\"Fig-source\">\u201d by Judy Schmitt is licensed under <\/span><a href=\"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><span class=\"Fig-source\"><span class=\"Hyperlink-underscore\">CC BY-NC-SA 4.0<\/span><\/span><\/a><span class=\"Fig-source\">.)<\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"_idGenObjectLayout-1\">\n<div id=\"_idContainer253\" class=\"_idGenObjectStyleOverride-1\"><img decoding=\"async\" class=\"_idGenObjectAttribute-19\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Area_under_the_Curve_Greater_Than_z3.55-2.png\" alt=\"\" \/><\/div>\n<\/div>\n<p class=\"Text\">Notice that no region of <a href=\"#_idTextAnchor159\"><span class=\"Fig-table-number-underscore\">Figure 6.5<\/span><\/a> appears to be shaded. That is because the area under the curve that far out into the tail is so small that it can\u2019t even be seen (the red line has been added to show exactly where the region starts). Thus, we already know that the probability must be smaller for <span class=\"italic\">N\u00a0<\/span>=\u00a050 than <span class=\"italic\">N <\/span>= 10 because the size of the area (the proportion) is much smaller.<\/p>\n<p class=\"Text\">The table only goes up to 4.00 because everything beyond that is almost 0 and changes so little that it\u2019s not worth printing values. The closest we can get is subtracting the largest value, .9990, from 1 to get .001. We know that, technically, the actual probability is smaller, so we say that the probability is <span class=\"italic\">p<\/span> &lt; .001, or less than 0.1%.<\/p>\n<p class=\"Text\">This example shows what an impact sample size can have. From the same population, looking for exactly the same thing, changing only the sample size took us from roughly a 5% chance (or about 1\/20 odds) to a less than 0.1% chance (or less than 1 in 1000). As the sample size <span class=\"italic\">n <\/span>increased, the standard error decreased, which in turn caused the value of <span class=\"italic\">z <\/span>to increase, which finally caused the <span class=\"italic\">p<\/span> value (a term for probability we will use a lot in <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>) to decrease. You can think of this relationship like gears: turning the first gear (sample size) clockwise causes the next gear (standard error) to turn counterclockwise, which causes the third gear (<span class=\"italic\">z<\/span>) to turn clockwise, which finally causes the last gear (probability) to turn counterclockwise. All of these pieces fit together, and the relationships will always be the same: As Sample size (n) increases, the Standard Error decreases. As Z increases, the probability decreases.<\/p>\n<p class=\"Equation\"><img decoding=\"async\" class=\"_idGenObjectAttribute-82\" src=\"https:\/\/pressbooks.palomar.edu\/wp-content\/uploads\/sites\/8\/2024\/10\/Eqn6.11-2.png\" alt=\"\" \/><\/p>\n<p class=\"Text\">Because the sample mean will naturally move around due to sampling error, our observed effect will also change naturally. In the context of our formula for <span class=\"italic\">z<\/span>, then, our standard error is how much we would naturally expect the observed effect to change. Changing by a little is completely normal, but changing by a lot might indicate something is going on. This is the basis of inferential statistics and the logic behind hypothesis testing, the subject of <a href=\"https:\/\/pressbooks.palomar.edu\/introtostats\/part\/unit-2-hypothesis-testing\/\"><span class=\"Hyperlink-underscore\">Unit 2<\/span><\/a>.<\/p>\n<p><strong data-start=\"3387\" data-end=\"3399\">Example:<\/strong><br data-start=\"3399\" data-end=\"3402\" \/>Let\u2019s say we survey LGBTQ+ students about campus belonging. If we take a small sample, the probability of finding a mean response far below the population average is higher just due to sampling error. But with larger samples, the probability of such an extreme result shrinks \u2014 meaning that when we do observe very low belonging scores, it is more likely to reflect a real problem on campus rather than just chance variation. Standard error helps us interpret these probabilities in context.<\/p>\n<h3 class=\"H1\">Exercises<\/h3>\n<ol>\n<li class=\"Numbered-list-Exercises-1st\">What is a sampling distribution?<\/li>\n<li class=\"Numbered-list-Exercises\">What are the two mathematical facts that describe how sampling distributions work?<\/li>\n<li class=\"Numbered-list-Exercises\">What is the difference between a sampling distribution and a regular distribution?<\/li>\n<li class=\"Numbered-list-Exercises\">What effect does sample size have on the shape of a sampling distribution?<\/li>\n<li class=\"Numbered-list-Exercises\">What is standard error?<\/li>\n<li class=\"Numbered-list-Exercises\">For a population with a mean of 75 and a standard deviation of 12, what proportion of sample means of size <span class=\"italic\">n<\/span> = 16 fall above 82?<\/li>\n<li class=\"Numbered-list-Exercises\">For a population with a mean of 100 and standard deviation of 16, what is the probability that a random sample of size 4 will have a mean between 110 and 130?<\/li>\n<li class=\"Numbered-list-Exercises\">Find the <span class=\"italic\">z<\/span>\u00a0score for the following means taken from a population with mean 10 and standard deviation 2:\n<ol>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 8, <span class=\"italic\">n<\/span> = 12<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 8, <span class=\"italic\">n<\/span> = 30<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 20, <span class=\"italic\">n<\/span> = 4<\/li>\n<li class=\"Numbered-list-Exercises-sub _idGenParaOverride-1\"><span class=\"italic\">M<\/span> = 20, <span class=\"italic\">n<\/span> = 16<\/li>\n<\/ol>\n<\/li>\n<li class=\"Numbered-list-Exercises\">As the sample size increases, what happens to the <span class=\"italic\">p<\/span> value associated with a given sample mean?<\/li>\n<li class=\"Numbered-list-Exercises\">For a population with a mean of 35 and standard deviation of 7, find the sample mean of size <span class=\"italic\">n<\/span>\u00a0=\u00a020 that cuts off the top 5% of the sampling distribution.<\/li>\n<li><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">A researcher is interested in estimating the average age when people had their first adult interaction with law enforcement. <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">Taking a random sample of 25 adults, she determines a sample mean of 20 years and <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">a sample standard deviation of 1.5 years. Construct a 95% confidence interval to <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">estimate the population mean age when adults first interaction with law enforcement occurred. Write a concluding <\/span><span class=\"textLayer--absolute\" dir=\"ltr\" role=\"presentation\">statement.<\/span><\/li>\n<\/ol>\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h3 class=\"H1\">Answers to Odd-Numbered Exercises<\/h3>\n<\/header>\n<p class=\"textbox__content\">1) The sampling distribution (or sampling distribution of the sample means) is the distribution formed by combining many sample means taken from the same population and of a single, consistent sample size.<\/p>\n<p><span style=\"text-align: initial;font-size: 1em\">3) A sampling distribution is made of statistics (e.g., the mean), whereas a regular distribution is made of individual scores.<\/span><\/p>\n<p>5) Standard error is the spread of the sampling distribution and is the quantification of sampling error. It is how much we expect the sample mean to naturally change based on random chance.<\/p>\n<p>7) 10.46% or .10469) As sample size increases, the p value will decrease.<\/p>\n<p>11) We are 95% certain that the population mean of the average age when people first interaction with law enforcement occurred is between 19.37 and 20.63.<\/p>\n<\/div>\n<div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_185_658\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_185_658\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_185_661\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_185_661\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_185_662\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_185_662\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_185_657\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_185_657\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_185_659\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_185_659\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":7,"menu_order":6,"template":"","meta":{"pb_show_title":"","pb_short_title":"Sampling Distributions","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-185","chapter","type-chapter","status-publish","hentry"],"part":21,"_links":{"self":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/185","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/users\/7"}],"version-history":[{"count":14,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/185\/revisions"}],"predecessor-version":[{"id":971,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/185\/revisions\/971"}],"part":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/parts\/21"}],"metadata":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapters\/185\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/pressbooks\/v2\/chapter-type?post=185"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/contributor?post=185"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.palomar.edu\/introtostats\/wp-json\/wp\/v2\/license?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}