Lesson 3: Basic Descriptive Statistics |
1 Overview
Lesson 3: Basic Descriptive Statistics | |||||||||
< Lesson 2 | | |||||||||
This lesson provides on overview of the general concepts and
calculations used in descriptive statistics. It is not intended to
provide detailed information about the entire series of calculations
one could compute to summarize data.
Upon completion of this lesson you should be able to:
|
|
||||||||
< Lesson 2 | |
2 Introduction to Descriptive Statistics
Lesson 3: Basic Descriptive Statistics |
The objective of descriptive statistics is to produce numbers which describe attributes of the sample. In short, descriptive statistics allows us to summarize our data in clear and meaningful way. ExampleLet’s assume we collected fuel loading data from 50 stands on a National Forest. We can now summarize this data in one of two ways. First, we can summarize the data numerically by computing statistics such as the mean and standard deviation; to show the average amount of fuel loading and the degree to which fuel loading differs between stands. The second way we could summarize this data is graphically by creating a stem and leaf diagram or a histogram. This method would provide information on the distribution of fuel loadings. You should remember that graphical representation of data is best used to show patterns within the data; where as numerical summarization is more precise and objective. However since both types of summarization are complementary it is best to use both. |
|
||
3 Calculating the Mean or Average
Lesson 3: Basic Descriptive Statistics |
The mean or average is the sum of all the data divided by the number
of data points. Mathematically it can be represented by the
following equation:
Xavg = ∑ (x / n) Where X avg is the mean or average, ∑x is the sum of all the data and n is equal to the number of data points. We will now go over an example. ExampleLet’s assume the following 10 numbers are the diameters of trees in a treatment unit we sampled. 10.2, 6.5, 8.2, 9.6, 24.0, 16.6, 13.4 5.3, 6.7, 31.8 The average or mean is a good indication of the center of the data if the data has a symmetric distribution. However if the distribution of the data is not symmetric or there are extremely high or low values the mean may not be the best choice for describing the center of the data. |
|
||
4 Calculating the Median
Lesson 3: Basic Descriptive Statistics |
The median is used to describe the center of the distribution of the
data. Or in other words the score where half the data is above and
half the data are below. For example let’s use the tree diameter
example from before but for simplicity lets assume we added another
sampling point with a diameter of 16.1 inches:
ExampleRecall that the following ten scores are the diameters of trees
we sampled. The first step in calculating the median is to rearrange the data so that they are in numerical order. Your data should now look like this: 5.3, 6.5, 6.7, 8.2, 9.6, 10.2, 13.4, 16.1, 16.6, 24.0, 31.8 Since the data has an odd number of samples we simply find the number with half the sample points above and below. In this case the median is equal to 10.2. Now why don’t you try to calculate the median? We are lucky we had another data point to add, so what would we have done if we still only had ten samples? Well in such a case you simply average the two middle scores. So for our original data set of 10.2, 6.5, 8.2, 9.6, 24.0, 16.6, 13.4 5.3, 6.7, 31.8 The two middle scores would be 9.6 and 10.2. So if we average those we get 9.9 as our median. Do the interactive demonstration from Rice University which shows the properties of the mean and median. |
|
||
5 Calculating the Mode
Lesson 3: Basic Descriptive Statistics |
The mode is simply the number which most frequently occurs in the
data. Let’s look at an example of how to calculate the mode.
ExampleLet’s assume the following data represents the number of snags per acre in 20 proposed treatment areas. 1, 1, 1, 2, 2, 3, 5, 5, 5, 6, 6, 6, 6, 8, 9, 10, 12, 13, 25, 25 Using this example we see that 6 snags per acre occurred more frequently then any other result. Therefore the mode for our data set would be equal to 6. While the advantage of calculating the mode is obvious, there are several disadvantages of using the mode to describe the central tendency of your data. One of the disadvantages of calculating a mode is that some data sets may have more then one mode. These sets are typically called multi or bi-modal. It is not recommended that you use only the mode to report the central tendencies of your data. |
|
||
6 Describing the Spread of Data
Lesson 3: Basic Descriptive Statistics |
We will now go over a few ways to describe the spread of data. The
easiest way to describe the spread of data is to calculate
the range. The range is the difference between the highest and
lowest values from a sample.
ExampleLet’s assume that the following data points represent the number of fire over 200 acres on a forest over the last ten years. 3, 3, 4, 6, 7, 9, 10, 11, 11, 12 The range of fires over ten years is then equal to 12 – 3 You can see that the range is very easily calculated. However, since it is only dependent upon two scores it is very sensitive to extreme values. The range is almost never used alone to describe the spread of data. It is often used in conjunction with the variance or the standard deviation. ExampleLet’s try an example together, the following data shows the total
surface fuel loading for 10 stands in a watershed. We want to get an
idea of how much variation there is in these stands so we will begin
by calculating the range of our data. 5.3, 6.4, 15.7, 11.8, 7.9, 26.5, 18.2, 11.3, 9.4, 10.5 |
|
||
7 Calculating the Variance
Lesson 3: Basic Descriptive Statistics |
The variance provides a description of how spread out the data you collected is. The variance is computed as the average squared deviation of each number from its mean. Mathematically speaking the formula looks like this:
Where M is equal to the mean of the data, and N is equal to the number of data points. ExampleTo better show you how to calculate the variance we will now go over an example. Let’s assume the following 4 data points represent trees per acre of our sample points for a potential treatment area. 156, 176, 184, 209 We would all agree that the range of this data is 53 trees per acre, and the mean is equal to about 181 trees per acre. We now need to calculate the variance. You can begin by subtracting the mean from each value and then squaring the result as shown below: (156 – 181)2 = 625 The next step is to add together the results. You should get 1443 S2 = 481 Calculating the variance is important for many other statistical calculations. So it is important that you have a basic understanding of how this value is calculated. It is also the first step in calculating the standard deviation. |
|
||
8 Calculating the Standard Deviation
Lesson 3: Basic Descriptive Statistics |
The formula for calculating the standard deviation is as follows:
S = √s2 It is simply the square root of the variance. The standard deviation is the most common measure of spread and is very useful in calculating many other statistics. ExampleIf we continue with our tree
density example we just used we could now calculate the standard
deviation by taking the square root of 481. This would give us a
standard deviation equal to 21.9. |
|
||
9 Graphical Representation of Data
Lesson 3: Basic Descriptive Statistics |
Although there are several fairly common ways to graphically display
descriptive statistics we will primarily focus on two:
ExampleLet’s go over an example of how to create a bar graph. To begin with let’s look at this blank graph seen below: The bottom line on a graph is often referred to as the x axis and the vertical line to the left is called the y axis. We will now use the following values which represent the number of bark beetle killed trees per acre on seven sample points from a watershed. 98, 102, 106, 105 102, 81, 24 Let’s begin by making a bar graph with this data: On the x axis we will label each site 1 through 7 and on the y axis we will plot the number of bark beetle killed trees per acre. Your graph should look like this when you are done: You can see that bar graphs are one way you can graphically show your data. Bar graphs are especially useful when you want to show data from qualitative variables. |
|
||
10 The Stem and Leaf Plot
Lesson 3: Basic Descriptive Statistics |
The stem and leaf plot is similar to a histogram or bar chart with
the exception that it allows a reader to get more precision when
reading the results.
ExampleLet’s take a look at what stem and leaf plot looks like and go over how they are constructed. The data used to construct the stem and leaf plot is as follows: 21, 22, 22, 24, 31, 31, 31, 35, 37, 43, 44, 46, 48, 48, 49, 52, 55, 55, 56, 67, 68, 68
Let’s look at this in a little more detail, the first sample point is equal to 21, therefore we enter a 2 under the stem side of the plot and a 1 under the leaves side. Your plot should now look like this.
Multiply the stem by 10 and then add the leaf to the stem to get the original value. If we add the next data point (22) the plot will then look like this.
We continue this process for all the data points. We end up with the stem and leaf plot we had at the beginning which looks like this:
This is just a simple example of a stem and leaf plot, you can see that not only does this type of plot show the distribution of the data but it also allows us to know precisely what data was recorded. |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 Review Questions
Lesson 3: Basic Descriptive Statistics |
| Lesson 4 > | ||||||||||||||||||
|
|
|||||||||||||||||
| Lesson 4 > |