How big are each of those? How big are each of those categories? And actually, I wrote histogram. We're taking data that can take on a whole bunch of different values, we're putting them into categories, and then we're gonna plot how many folks are in each category. And the visualization that we're gonna create, this is called a histogram. So this is one way of thinking about how the ages are distributed, but let's actually make a visualization of this. Alright, what about 40 to 49? We have one, two people. Only one person in that 30 to 39 bin or bucket or category. Alright, what about 30 to 39? We have one, and that's it. What about 20 to 29? So that's one, two, three, four, five people. How many people fall into the 10 to 19-year-old bucket? Well, let's see. So then how many people fall into the zero to nine-year-old bucket? Well it's gonna be one, two, three, four, five, six people fall into that bucket. I don't see anyone 70 years old or older here. Why don't we just define all of the buckets here? So the next one is ages 10 to 19, then 20 to 29, then 30 to 39, and 40 to 49, 50 to 59, let me make sure you can read that properly, then you have 60 to 69. ![]() So let's say the first one is ages zero to nine. It's the number (laughing), it's the number in the bucket. Number, I'll just write the number, oops. So the bucket, I like to think of it more of as a bucket, the bucket and then the number in the bucket. And so how could you do that? Well one way to think about it, is to put these ages into different buckets, and then to think about how many people are there in each of those buckets? Or sometimes someone might say how many in each of those bins? So let's do that. And so you're interested in somehow presenting this, somehow visualizing the distribution of the ages, because you want just say, well, are there more young people? Are there more teenagers? Are there more middle-aged people? Are there more seniors here? And so when you just look at these numbers it really doesn't give you a good sense of it. And so these are the ages of everyone in the restaurant at that moment. So you go around the restaurant and you write down everyone's age. This decision, along with the choice of starting point for the first interval, affects the shape of the histogram.- So let's say you were to go to a restaurant and just out of curiosity you want to see what the makeup of the ages at the restaurant are. Your choice of bin width determines the number of class intervals. There is more to be said about the widths of the class intervals, sometimes called bin widths. You can change a histogram based on frequencies to one based on relative frequencies by (a) dividing each class frequency by the total number of observations, and then (b) plotting the quotients on the \(Y\)-axis (labeled as proportion). In this case, the \(Y\)-axis runs from \(0\) to \(1\) (or somewhere in between if there are no extreme proportions). Histograms based on relative frequencies show the proportion of scores in each interval rather than the number of scores. ![]() Histograms can be based on relative frequencies instead of actual frequencies. Note also that some computer programs label the middle of each interval rather than the end points. Using whole numbers as boundaries avoids a cluttered appearance, and is the practice of many computer programs that create histograms. For example, one interval might hold times from \(4000\) to \(4999\) milliseconds. ![]() The class frequency is then the number of observations that are greater than or equal to the lower bound, and strictly less than the upper bound. (It would be quite a coincidence for a task to require exactly \(7\) seconds, measured to the nearest thousandth of a second.) We are therefore free to choose whole numbers as boundaries for our class intervals, for example, \(4000,\ 5000\), etc. ![]() In this case, there is no need to worry about fence-sitters since they are improbable. Histograms can also be used when the scores are measured on a more continuous scale such as the length of time (in milliseconds) required to perform a task. In our example, the observations are whole numbers. (We'll have more to say about shapes of distributions in the chapter " Summarizing Distributions.") The distribution is therefore said to be skewed. You can also see that the distribution is not symmetric: the scores extend to the right farther than they do to the left. The histogram makes it plain that most of the scores are in the middle of the distribution, with fewer scores in the extremes. \): Histogram of scores on a psychology test.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |