Skip to main content

MORE ON CALCULATING THE VARIANCE


In previous posts, you were introduced to the concept of variance. Now, we are learning more about it.

Population variance
Formula 1: $\sigma^2 = \frac{\sum_{i=1}^{n}  (X_{i} - \bar{X})^2}{n}$
Formula 2: $\sigma^2 = \frac{\sum_{i=1}^{n} {X_{i}}^2}{n} - (\bar{X})^2$
The two formulae give the same result.

Example 1
Mr. Ahmad had 6 cell phone counters. In July 2016, the net profits obtained from each counter were: 4, 7, 5, 3, 5, 6 (in millions rupiahs). What was the variance of the net profit of the six counters in July 2016?

Answer
In this example, we are asked to calculate the variance of the net profits of the six counters owned by Mr. Ahmad in July 2016. The given data are the net profit data of all the counters. So, they are the data of all population members. To solve this problem, firstly calculate the mean.
$\bar{X} = Rp \frac{4+7+5+3+5+6}{6} million = Rp \: 5 \:  million$
Using Formula 1, calculate the variance.
$\sigma^2 = \frac{(4-5)^2+(7-5)^2+(5-5)^2+(3-5)^2+(5-5)^2+(6-5)^2}{6} (million \: Rupiahs)^2$
$\sigma^2 \approx 1.67 (million \: Rupiahs)^2$
Alternatively, we may use Formula 2 as follows.
$\sigma^2 = (\frac{4^2+7^2+5^2+3^2+5^2+6^2}{6} - 5^2) (million \: Rupiahs)^2$
$\sigma^2 \approx (26.67 - 25) (million \: Rupiahs)^2 = 1.67 (million \:  Rupiahs)^2$

Sample variance
Formula 3: $s^2 = \frac{\sum_{i=1}^{n} (X_{i} - \bar{X})^2}{n-1}$
Formula 4: $s^2 = \frac{\sum_{i=1}^{n} {X_{i}}^2}{n-1} - \frac{n}{n-1} (\bar{X})^2$

Example 2
Mr. Ahmad had hundreds of cell phone counters. He wanted to know the magnitude of variability of the net profits from all the counters he had in July 2016. To get the answer fast, he got the net profit data from 6 counters as samples, and obtained the following results: 4, 7, 5, 3, 5, 6 (in millions Rupiahs). What was variance of these data?

Answer
In this example, the six data are not data of all counters, but only some parts of them. Therefore, the variance that will be obtained is the sample variance.
As shown in Example 1, $\bar{X} = Rp \frac{4+7+5+3+5+6}{6} million = Rp \: 5 \:  million$. The use of Formula 3 will result in:
$s^2 = \frac{(4-5)^2+(7-5)^2+(5-5)^2+(3-5)^2+(5-5)^2+(6-5)^2}{6-1} (million \: Rupiahs)^2$
$s^2 = 2 (million \: Rupiahs)^2$
Alternatively, we may use Formula 4 as follows.
$s^2 = (\frac{4^2+7^2+5^2+3^2+5^2+6^2}{6-1} - \frac{6}{6-1} \cdot 5^2) (million \: Rupiahs)^2$
$s^2 = (32 - 30) (million \: Rupiahs)^2 = 2 (million \: Rupiahs)^2$


EXERCISE PART I
For the following data sets, calculate the population variance using Formula 1 and Formula 2. Compare the results.
Data set A: 15, 11, 20, 27, 17
Data set B: 142, 150, 146, 145, 153, 127, 145
Data set C: 0.53, 0.92, 0.47, 0.65, 0.71, 0.44, 0.54, 0.70

EXERCISE PART II
Use the data sets in Exercise Part I, but assume that they are sample data. Use Formula 3 and 4. Compare the results.

EXERCISE PART III
  1. The Quality Control Division of a company consists of a Division Head and four staffs. The following are data on the number of total working hours of the Division members in August 2016: 205, 195, 217, 243, 190. Determine the variance of the Division members’ total working hours in August 2016.
  2. The daily net profits for the first 7 days in August 2018, in thousands Rupiahs, were as follows: 214, 230, 195, 204, 184, 210, 156. Calculate the variance of daily net profits in the first week of August 2018.
  3. Mr. Andi had a small coffee shop. He would like to estimate the variance of daily net profits achieved in August 2018 by calculating the sample variance of daily net profits. He selected at random 10 days from which the net profits data were obtained. He got the following data,  in thousands Rupiahs: 230, 195, 204, 184, 210, 156, 214, 177, 160, 180. Which variance was appropriate in this case? Calculate it.




Comments

Popular posts from this blog

CONSTRUCTING THE FREQUENCY DISTRIBUTION TABLE

The frequency distribution table is a table that divides data into groups (classes) and shows how many data values occur in each group/class. Below is an example of frequency distribution table. Now we are learning how to create a frequency distribution table. Suppose we have a collection of ungrouped data on last year’s advertising expenditures of 40 logistics companies, recorded in millions Rupiahs. To construct a frequency distribution table of the ungrouped data, apply the following steps. Step 1: Find the range of the data The range (R) is defined as the difference between the largest data and the smallest data. In this case, R = 307 - 242 = 65. Step 2: Determine the number of categories/classes (k) Applying Sturges rule (k = 1 + 3,322 log n, where n = the number of data), we have: $k = 1 + 3.322 \: log \: 40 \approx 6.32$ As the value of k must be a natural number, 6.32 is rounded up to 7, so k = 7. Step 3: Determine the class width (c) To find c, use $...

THE QUARTILES AND MEDIAN OF GROUPED DATA

In this post,  we will learn how to determine the quartiles when some quantitative data are presented in a frequency distribution table. For example, we have the following data, showing Flesch Readability Score of 80 monthly bulletin articles published by Britt and Co. Ltd. Find the quartiles of these readability scores. To answer this, first augment the table with a new column to the right of the frequency column, namely Data Numbers column. There are 5 data in the first class, so the class contains data no. 1 to  no. 5. There are 7 data in the second class, so the class contains data no. 6 to no. 12. There are 13 data in the third class, so the class contains data no. 13 to no. 25. Continuing this way, we get the following: In this case, finding the first quartile means finding the  20 th  data, after the data have been ordered from the smallest to the highest (20 = ¼ x 80). Note that the  20 th  data is in the third class (20 is ...

CALCULATING THE MEAN OF GROUPED DATA

Sometimes quantitative data are presented in the form of a frequency distribution table (FDT). A typical FDT is as follows. Suppose that the table above presents the duration of 16 cell phone conversations between pairs of teens. There are 2 conversations with duration from 30 seconds to 44 seconds, 3 conversations with duration from 45 seconds to 59 seconds, etc. How do we calculate the mean of the data? Step 1: Determine the midpoint of each class If M i denotes the midpoint of class i, $M_{i} = \frac{LB_{i}+UB_{i}}{2}$ where  LB i  = lower bound of class i and  UB i  = upper bound of class i. The lower bounds of class 1, 2, 3, 4, 5 are 30, 45, 60, 75, 90, respectively and the upper bounds are 44, 59, 74, 89, 104, respectively. Then, $M_{1} = \frac{30+44}{2} = 37$.  Similarly, $M_{2} = \frac{45+59}{2} = 52$. Continuing this way, we have the following table. Step 2: Multiply each class frequency f i by the corresponding class midpoint ...