Lesson 3: Basic Descriptive Statistics

1 Overview

Lesson 3: Basic Descriptive Statistics
< Lesson 2 |
This lesson provides on overview of the general concepts and calculations used in descriptive statistics. It is not intended to provide detailed information about the entire series of calculations one could compute to summarize data.

Upon completion of this lesson you should be able to:

bulletUnderstand the methods for summarizing sets of data
bulletCompute some basic descriptive statistics
bulletComprehend and produce basic graphical representations of data sets
 
LESSON 3
< Lesson 2 |

2 Introduction to Descriptive Statistics

Lesson 3: Basic Descriptive Statistics

The objective of descriptive statistics is to produce numbers which describe attributes of the sample. In short, descriptive statistics allows us to summarize our data in clear and meaningful way.

Example

Let’s assume we collected fuel loading data from 50 stands on a National Forest. We can now summarize this data in one of two ways. First, we can summarize the data numerically by computing statistics such as the mean and standard deviation; to show the average amount of fuel loading and the degree to which fuel loading differs between stands.

The second way we could summarize this data is graphically by creating a stem and leaf diagram or a histogram. This method would provide information on the distribution of fuel loadings.

You should remember that graphical representation of data is best used to show patterns within the data; where as numerical summarization is more precise and objective. However since both types of summarization are complementary it is best to use both.

LESSON 3

3 Calculating the Mean or Average

Lesson 3: Basic Descriptive Statistics
The mean or average is the sum of all the data divided by the number of data points. Mathematically it can be represented by the following equation:

Xavg = ∑ (x / n)

Where X avg is the mean or average, ∑x is the sum of all the data and n is equal to the number of data points. We will now go over an example.

Example

Let’s assume the following 10 numbers are the diameters of trees in a treatment unit we sampled.

10.2, 6.5, 8.2, 9.6, 24.0, 16.6, 13.4 5.3, 6.7, 31.8

The average diameter would therefore be equal to which of the following:
A. 16.6
B. 13.2
C. 9.6
D. 24.0

Response:

The average or mean is a good indication of the center of the data if the data has a symmetric distribution. However if the distribution of the data is not symmetric or there are extremely high or low values the mean may not be the best choice for describing the center of the data.

LESSON 3

4 Calculating the Median

Lesson 3: Basic Descriptive Statistics
The median is used to describe the center of the distribution of the data. Or in other words the score where half the data is above and half the data are below. For example let’s use the tree diameter example from before but for simplicity lets assume we added another sampling point with a diameter of 16.1 inches:
Example

Recall that the following ten scores are the diameters of trees we sampled.
10.2, 6.5, 8.2, 9.6, 24.0, 16.6, 13.4 5.3, 6.7, 31.8 and now 16.1

The first step in calculating the median is to rearrange the data so that they are in numerical order. Your data should now look like this:

5.3, 6.5, 6.7, 8.2, 9.6, 10.2, 13.4, 16.1, 16.6, 24.0, 31.8

Since the data has an odd number of samples we simply find the number with half the sample points above and below. In this case the median is equal to 10.2.

Now why don’t you try to calculate the median?

The following data shows the number of owl packs on 11 study sites.

4, 8, 2, 5, 7, 9, 12, 15, 13, 11, 3

Which of the following is median?

A. 8.0
B. 7.9
C. 5
D. 9

Response:

We are lucky we had another data point to add, so what would we have done if we still only had ten samples?

Well in such a case you simply average the two middle scores. So for our original data set of 10.2, 6.5, 8.2, 9.6, 24.0, 16.6, 13.4 5.3, 6.7, 31.8

The two middle scores would be 9.6 and 10.2. So if we average those we get 9.9 as our median.

Now you try one: The following data shows the fire rate of spread in chains per hour for 6 forest fires, which of the following values is equal to the median?

Fire rate of spread in chains per hour for 6 forest fires: 22, 36, 48, 4, 10, 25
A. 26.8
B. 48.0
C. 4.0
D. 23.5

Response:

Do the interactive demonstration from Rice University which shows the properties of the mean and median.

LESSON 3

5 Calculating the Mode

Lesson 3: Basic Descriptive Statistics
The mode is simply the number which most frequently occurs in the data. Let’s look at an example of how to calculate the mode.
Example

Let’s assume the following data represents the number of snags per acre in 20 proposed treatment areas.

1, 1, 1, 2, 2, 3, 5, 5, 5, 6, 6, 6, 6, 8, 9, 10, 12, 13, 25, 25

Using this example we see that 6 snags per acre occurred more frequently then any other result. Therefore the mode for our data set would be equal to 6.

While the advantage of calculating the mode is obvious, there are several disadvantages of using the mode to describe the central tendency of your data. One of the disadvantages of calculating a mode is that some data sets may have more then one mode. These sets are typically called multi or bi-modal. It is not recommended that you use only the mode to report the central tendencies of your data.

The following data shows the average height of 15 aspen stands, which of the following is the mode?

67, 64, 52, 58, 54, 54, 55, 56, 54, 56, 60, 52, 55, 54, 50

A. 56
B. 54
C. 55
D. 60

Response:

LESSON 3

6 Describing the Spread of Data

Lesson 3: Basic Descriptive Statistics
We will now go over a few ways to describe the spread of data. The easiest way to describe the spread of data is to calculate the range. The range is the difference between the highest and lowest values from a sample.
Example

Let’s assume that the following data points represent the number of fire over 200 acres on a forest over the last ten years.

3, 3, 4, 6, 7, 9, 10, 11, 11, 12

The range of fires over ten years is then equal to 12 – 3
Or 9 fires over 200 acres.

You can see that the range is very easily calculated. However, since it is only dependent upon two scores it is very sensitive to extreme values. The range is almost never used alone to describe the spread of data. It is often used in conjunction with the variance or the standard deviation.

Example

Let’s try an example together, the following data shows the total surface fuel loading for 10 stands in a watershed. We want to get an idea of how much variation there is in these stands so we will begin by calculating the range of our data.

Total fuel loading on 10 sites in tons per acre:

5.3, 6.4, 15.7, 11.8, 7.9, 26.5, 18.2, 11.3, 9.4, 10.5

Which of the following values is equal to the range of total fuel loadings we found on our 10 sites?

A. 10.6
B. 18.9
C. 7.9
D. 26.5
E. 21.2

Response:

LESSON 3

7 Calculating the Variance

Lesson 3: Basic Descriptive Statistics

The variance provides a description of how spread out the data you collected is. The variance is computed as the average squared deviation of each number from its mean. Mathematically speaking the formula looks like this:

Where M is equal to the mean of the data, and N is equal to the number of data points.

Example

To better show you how to calculate the variance we will now go over an example. Let’s assume the following 4 data points represent trees per acre of our sample points for a potential treatment area.

156, 176, 184, 209

We would all agree that the range of this data is 53 trees per acre, and the mean is equal to about 181 trees per acre. We now need to calculate the variance. You can begin by subtracting the mean from each value and then squaring the result as shown below:

(156 – 181)2 = 625
(176 – 181)2 = 25
(184 - 181)2 = 9
(209 - 181)2 = 784

The next step is to add together the results. You should get 1443
Last you divide this number by the number of sample points minus 1. You should end up with the following solution:

S2 = 481

Calculating the variance is important for many other statistical calculations. So it is important that you have a basic understanding of how this value is calculated. It is also the first step in calculating the standard deviation.

The following three data points are the flow rates for three streams in a given watershed in cubic feet per second:
106, 125, 202

The mean for the data is equal to 144.3 cubic feet per second, and the number of samples is equal to 3

What is the variance for this data set?
A. 96
B. 9216.5
C. 2544.4
D. 1124.7

Response:

LESSON 3

8 Calculating the Standard Deviation

Lesson 3: Basic Descriptive Statistics
The formula for calculating the standard deviation is as follows:

S = √s2

It is simply the square root of the variance. The standard deviation is the most common measure of spread and is very useful in calculating many other statistics.

Example

If we continue with our tree density example we just used we could now calculate the standard deviation by taking the square root of 481. This would give us a standard deviation equal to 21.9.

Let’s use our last example to calculate the standard deviation.

Remember the following three data points are the flow rates for three streams in a given watershed in cubic feet per second: 106, 125, 202

The mean for the data is equal to 144.3 cubic feet per second, the variance is equal to 2544.4 and the number of samples is equal to 3

Which of the following is equal to the standard error?
A. 64.3
B. 50.4
C. 83.7
D. 48.9

Response:

LESSON 3

9 Graphical Representation of Data

Lesson 3: Basic Descriptive Statistics
Although there are several fairly common ways to graphically display descriptive statistics we will primarily focus on two:
  1. The Histogram or the Bar Graph
  2. Stem and Leaf Plot
Example

Let’s go over an example of how to create a bar graph. To begin with let’s look at this blank graph seen below:


The bottom line on a graph is often referred to as the x axis and the vertical line to the left is called the y axis.

We will now use the following values which represent the number of bark beetle killed trees per acre on seven sample points from a watershed.

98, 102, 106, 105 102, 81, 24

Let’s begin by making a bar graph with this data: On the x axis we will label each site 1 through 7 and on the y axis we will plot the number of bark beetle killed trees per acre.

Your graph should look like this when you are done:

You can see that bar graphs are one way you can graphically show your data. Bar graphs are especially useful when you want to show data from qualitative variables.

LESSON 3

10 The Stem and Leaf Plot

Lesson 3: Basic Descriptive Statistics
The stem and leaf plot is similar to a histogram or bar chart with the exception that it allows a reader to get more precision when reading the results.
Example

Let’s take a look at what stem and leaf plot looks like and go over how they are constructed. The data used to construct the stem and leaf plot is as follows:

21, 22, 22, 24, 31, 31, 31, 35, 37, 43, 44, 46, 48, 48, 49, 52, 55, 55, 56, 67, 68, 68

Stem Leaves
2 1224
3 14457
4 346889
5 2556
6 788
Multiply the stem by 10 and then add the leaf to the stem.

Let’s look at this in a little more detail, the first sample point is equal to 21, therefore we enter a 2 under the stem side of the plot and a 1 under the leaves side. Your plot should now look like this.

Stem Leaves
2 1
3  
4  
5  
6  
Multiply the stem by 10 and then add the leaf to the stem.

Multiply the stem by 10 and then add the leaf to the stem to get the original value.

If we add the next data point (22) the plot will then look like this.

Stem Leaves
2 12
3  
4  
5  
6  
Multiply the stem by 10 and then add the leaf to the stem.

We continue this process for all the data points. We end up with the stem and leaf plot we had at the beginning which looks like this:

Stem Leaves
2 1224
3 14457
4 346889
5 2556
6 788
Multiply the stem by 10 and then add the leaf to the stem.

This is just a simple example of a stem and leaf plot, you can see that not only does this type of plot show the distribution of the data but it also allows us to know precisely what data was recorded.

LESSON 3

11 Review Questions

Lesson 3: Basic Descriptive Statistics
| Lesson 4 >

Please match the Mean, Median, Mode, Range, Variance and Standard Deviation for the crown base heights on the seven sites we sampled to get the correct answer.

a. 15.0 Mean
b. 11.0 Mode
c. 15.2 Median
d. 13.9 Range
e. 3.9 Standard Deviation
f. 18.0 Variance
 

Response:

LESSON 3
| Lesson 4 >