Skip to main content

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS


There are lots of uncertainties in our life: monthly revenues of a business, the number of vehicles left in a parking lot everyday, the number of phone calls we get everyday, the money we spent on watching movies every year, names of customers who will enter the queue for the teller, the colours of the cars which are going to enter a toll gate in the next one hour, whether or not the next flight is going to be late, etc.  The first three are random variables, but other three are not. Among the things that are uncertain, some are random variables, some are not. 

So, what characterizes a random variable? Let’s see some definitions.


  1. A random variable is a function that associates a real number with each element in the sample space. (Walpole, 1993)
  2. A random variable is a function associated with an experiment whose values are real numbers and their occurence in the trials depends on chance. (Kreyszig, 1993)


First, the value of a random variable should be real numbers. Names are not numbers, therefore names of customers who will enter the queue for the teller cannot be a random variable. Similarly, colours are not numbers, so the colours of the cars which are going to enter a toll gate in the next one hour cannot be a random variable. Second, the occurence of the numbers is by chance. We do not know what value will occur next. Monthly revenue of a business is a random variable. Its values are real numbers such as (in dollars) 30 thousand, 45 thousand, 100 thousand, etc. Besides, their occurence is by chance. We do not know whether next month’s revenue will be $30,000, $45,000 or other values. As in the previous case, the number of phone calls we get everyday is a random variables. The values can be 0 (no call), 1, 2, 3, etc. They are real numbers and occur by chance. And we do not know how many calls we will get tomorrow.

Every  random variable has a probability distribution associated with it. It seems logical because the occurence of the random variable’s values is by chance. Let’s see an introductory example. Consider a simple bet in which two dice are rolled simultaneously. If both dice show the same number of spots, we win and receive 60 dollars. Otherwise, we lose and pay 12 dollars. Let X be the amount of money we earn from the bet. X is a random variable. Why? The values it can take are 60 dollars and (-12) dollars. [The negative sign indicates that we lose.] There is a $\frac{1}{6}$ probability of winning 60 dollars and a $\frac{5}{6}$ probability of losing 12 dollars. [See Example 2P in https://edcommstatistics.blogspot.com/2019/09/the-theoretical-probabilities.html] Here, we see that each possible values of X has some certain amount of probability associated with it. The ordered pairs $(60,\frac{1}{6})$ and $(-12,\frac{5}{6})$ are the members of a set called probability distribution. The probability distribution can be presented in a table as follows.


Let’s see another example. In a lottery, two coins are tossed simultaneously, once. The amount of money that we received is based on the number of coins showing the Head (H) side. If no Head appears we get no money.  If one H appears, we get 10 dollars. If two H’s appear, we receive 20 dollars. Let Y be the amount of money we earn from the bet. Assuming the coins are fair, we have the following probability distribution.


(Please refer to Example 3P in https://edcommstatistics.blogspot.com/2019/09/the-theoretical-probabilities.html to get the probability values)


THE MEAN OF A RANDOM VARIABLE
Let X be a discrete random variable with probability distribution f. The mean of X, denoted by E[X], is defined as:
$E[X] = \displaystyle \sum_{x}^{ \: } \: x \cdot f(x)$
where the summation takes all possible value of x.


Comments

Popular posts from this blog

CONSTRUCTING THE FREQUENCY DISTRIBUTION TABLE

The frequency distribution table is a table that divides data into groups (classes) and shows how many data values occur in each group/class. Below is an example of frequency distribution table. Now we are learning how to create a frequency distribution table. Suppose we have a collection of ungrouped data on last year’s advertising expenditures of 40 logistics companies, recorded in millions Rupiahs. To construct a frequency distribution table of the ungrouped data, apply the following steps. Step 1: Find the range of the data The range (R) is defined as the difference between the largest data and the smallest data. In this case, R = 307 - 242 = 65. Step 2: Determine the number of categories/classes (k) Applying Sturges rule (k = 1 + 3,322 log n, where n = the number of data), we have: $k = 1 + 3.322 \: log \: 40 \approx 6.32$ As the value of k must be a natural number, 6.32 is rounded up to 7, so k = 7. Step 3: Determine the class width (c) To find c, use $

THE QUARTILES AND MEDIAN OF GROUPED DATA

In this post,  we will learn how to determine the quartiles when some quantitative data are presented in a frequency distribution table. For example, we have the following data, showing Flesch Readability Score of 80 monthly bulletin articles published by Britt and Co. Ltd. Find the quartiles of these readability scores. To answer this, first augment the table with a new column to the right of the frequency column, namely Data Numbers column. There are 5 data in the first class, so the class contains data no. 1 to  no. 5. There are 7 data in the second class, so the class contains data no. 6 to no. 12. There are 13 data in the third class, so the class contains data no. 13 to no. 25. Continuing this way, we get the following: In this case, finding the first quartile means finding the  20 th  data, after the data have been ordered from the smallest to the highest (20 = ¼ x 80). Note that the  20 th  data is in the third class (20 is in the range of 13 - 25, as s

CALCULATING THE MEAN OF GROUPED DATA

Sometimes quantitative data are presented in the form of a frequency distribution table (FDT). A typical FDT is as follows. Suppose that the table above presents the duration of 16 cell phone conversations between pairs of teens. There are 2 conversations with duration from 30 seconds to 44 seconds, 3 conversations with duration from 45 seconds to 59 seconds, etc. How do we calculate the mean of the data? Step 1: Determine the midpoint of each class If M i denotes the midpoint of class i, $M_{i} = \frac{LB_{i}+UB_{i}}{2}$ where  LB i  = lower bound of class i and  UB i  = upper bound of class i. The lower bounds of class 1, 2, 3, 4, 5 are 30, 45, 60, 75, 90, respectively and the upper bounds are 44, 59, 74, 89, 104, respectively. Then, $M_{1} = \frac{30+44}{2} = 37$.  Similarly, $M_{2} = \frac{45+59}{2} = 52$. Continuing this way, we have the following table. Step 2: Multiply each class frequency f i by the corresponding class midpoint M i , resulting in f i M