Simple random sampling is a sample of individuals that exist in a population; the individuals are randomly selected from the population and placed into a sample. This method of randomly selecting individuals seeks to select a sample size that is an unbiased representation of the population. However, it's not advantageous when the samples of the population vary widely.
Stratified random sampling is a better method when there are different subgroups in the population. Stratified random sampling divides a population into subgroups or strata, and random samples are taken, in proportion to the population, from each of the strata created. The members in each of the stratum formed have similar attributes and characteristics. This method of sampling is widely used and very useful when the target population is heterogeneous. A simple random sample should be taken from each stratum. Stratified random sampling can be used, for example, to sample students’ grade point averages (GPA) across the nation, people that spend overtime hours at work, and the life expectancy across the world.
For example, suppose a research team wants to determine the GPA of college students across the U.S. The research team has difficulty collecting data from all 21 million college students; it decides to take a random sample of the population by using 4,000 students.
Now assume that the team looks at the different attributes of the sample participants and wonders if there are any differences in GPAs and students’ majors. Suppose it finds that 560 students are English majors, 1135 are science majors, 800 are computer science majors, 1090 are engineering majors, and 415 are math majors. The team wants to use a proportional stratified random sample where the stratum of the sample is proportional to the random sample in the population.
Assume the team researches the demographics of college students in the U.S and finds the percentage of what students major in: 12% major in English, 28% major in science, 24% major in computer science, 21% major in engineering and 15% major in mathematics. Thus, five strata are created from the stratified random sampling process.
The team then needs to confirm that the stratum of the population is in proportion to the stratum in the sample; however, they find the proportions are not equal. The team then needs to resample 4,000 students from the population and randomly select 480 English, 1120 science, 960 computer science, 840 engineering, and 600 mathematics students. With those, it has a proportionate stratified random sample of college students, which provides a better representation of students' college majors in the U.S. The researchers can then highlight specific stratum, observe the varying studies of U.S. college students and observe the varying grade point averages.
The same method used above can be used for the polling of elections, income of varying populations, and income for different jobs across a nation, just to list a few of the applications.
Read further on how to differentiate a simple sample from a stratified sample - What is the Difference Between a Simple Random Sample and a Stratified Random Sample?
Sampling Methods can be classified into one of two categories:
- Probability Sampling: Sample has a known probability of being selected
- Non-probability Sampling: Sample does not have known probability of being selected as in convenience or voluntary response surveys
In probability sampling it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling:
- Simple Random Sampling (SRS)
- Stratified Sampling
- Cluster Sampling
- Systematic Sampling
- Multistage Sampling (in which some of the methods above are combined in stages)
Of the five methods listed above, students have the most trouble distinguishing between stratified sampling and cluster sampling.
Stratified Sampling is possible when it makes sense to partition the population into groups based on a factor that may influence the variable that is being measured. These groups are then called strata. An individual group is called a stratum. With stratified sampling one should:
- partition the population into groups (strata)
- obtain a simple random sample from each group (stratum)
- collect data on each sampling unit that was randomly sampled from each group (stratum)
Stratified sampling works best when a heterogeneous population is split into fairly homogeneous groups. Under these conditions, stratification generally produces more precise estimates of the population percents than estimates that would be found from a simple random sample. Table 3.2 shows some examples of ways to obtain a stratified sample.
Table 3.2. Examples of Stratified Samples
|Example 1||Example 2||Example 3|
|Population||All people in U.S.||All PSU intercollegiate athletes||All elementary students in the local school district|
|Groups (Strata) |
4 Time Zones in the U.S. (Eastern,Central, Mountain,Pacific)
|26 PSU intercollegiate teams||11 different elementary schools in the local school district|
|Obtain a Simple Random Sample||500 people from each of the 4 time zones||5 athletes from each of the 26 PSU teams||20 students from each of the 11 elementary schools|
|Sample||4 × 500 = 2000 selected people||26 × 5 = 130 selected athletes||11 × 20 = 220 selected students|
Cluster Sampling is very different from Stratified Sampling. With cluster sampling one should
- divide the population into groups (clusters).
- obtain a simple random sample of so many clusters from all possible clusters.
- obtain data on every sampling unit in each of the randomly selected clusters.
It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only different, but also more complicated than that used with stratified sampling.
Table 3.3. Examples of Cluster Samples
|Example 1||Example 2||Example 3|
|Population||All people in U.S.||All PSU intercollegiate athletes||All elementary students in a local school district|
|Groups (Clusters)||4 Time Zones in the U.S. (Eastern,Central, Mountain,Pacific.)||26 PSU intercollegiate teams||11 different elementary schools in the local school district|
|Obtain a Simple Random Sample||2 time zones from the 4 possible time zones||8 teams from the 26 possible teams||4 elementary schools from the l1 possible elementary schools|
|Sample||every person in the 2 selected time zones||every athlete on the 8 selected teams||every student in the 4 selected elementary schools|
Each of the three examples that are found in Tables 3.2 and 3.3 were used to illustrate how both stratified and cluster sampling could be accomplished. However, there are obviously times when one sampling method is preferred over the other. The following explanations add some clarification about when to use which method.
- With Example 1: Stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are affected by time zone. For example the percentage of people watching a live sporting event on television might be highly affected by the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire population. In this case, selecting 2 clusters from 4 possible clusters really does not provide much advantage over simple random sampling.
- With Example 2: Either stratified sampling or cluster sampling could be used. It would depend on what questions are being asked. For instance, consider the question "Do you agree or disagree that you receive adequate attention from the team of doctors at the Sports Medicine Clinic when injured?" The answer to this question would probably not be team dependent, so cluster sampling would be fine. In contrast, if the question of interest is "Do you agree or disagree that weather affects your performance during an athletic event?" The answer to this question would probably be influenced by whether or not the sport is played outside or inside. Consequently, stratified sampling would be preferred.
- With Example 3: Cluster sampling would probably be better than stratified sampling if each individual elementary school appropriately represents the entire population as in aschool district where students from throughout the district can attend any school. Stratified sampling could be used if the elementary schools had very different locations and served only their local neighborhood (i.e., one elementary school is located in a rural setting while another elementary school is located in an urban setting.) Again, the questions of interest would affect which sampling method should be used.
The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such "robo call polls" can be very biased because they have extremely low response rates (most people don't like speaking to a machine) and because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they are trying to survey.
The following sampling methods that are listed in your text are types of non-probability sampling that should be avoided:
- volunteer samples
- haphazard (convenience) samples
Since such non-probability sampling methods are based on human choice rather than random selection, statistical theory cannot explain how they might behave and potential sources of bias are rampant. In your textbook, the two types of non-probability samples listed above are called "sampling disasters."
Read the article: "How Polls are Conducted" by the Gallup organization available in Canvas.
The article provides great insight into how major polls are conducted. When you are finished reading this article you may want to go to the Gallup Poll Web site, http://www.gallup.com, and see the results from recent Gallup polls. Another excellent source of public opinion polls on a wide variety of topics using solid sampling methodology is the Pew Research Center website at http://www.pewresearch.org When you read one of the summary reports on the Pew site, there is a link (in the upper right corner) to the complete report giving more detailed results and a full description of their methodology as well as a link to the actual questionnaire used in the survey so you can judge whether their might be bias in the wording of their survey.
It is important to be mindful of margin or error as discussed in this article. We all need to remember that public opinion on a given topic cannot be appropriately measured with one question that is only asked on one poll. Such results only provide a snapshot at that moment under certain conditions. The concept of repeating procedures over different conditions and times leads to more valuable and durable results. Within this section of the Gallup article, there is also an error: "in 95 out of those 100 polls, his rating would be between 46% and 54%." This should instead say that in an expected 95 out of those 100 polls, the true population percent would be within the confidence interval calculated. In 5 of those surveys, the confidence interval would not contain the population percent.