Skip to main content

Posts

SIMPLE LINEAR REGRESSION

As a special case of association or relationship, influence or effect is another notable concept in statistics. Instead of testing whether there is a relationship between communication quality and customers’ satisfaction, one may be more interested to investigate whether communication quality has an influence on customers’ satisfaction. Similarly, rather than questioning whether public perception and destination image of Jakarta are correlated, a researcher may enquire whether public perception has effect on destination image of Jakarta. One of many statistical tools to address such questions is regression analysis. In fact, regression analysis has many types and uses. What we are discussing here is the simplest form of regression model called simple linear regression. To perform the regression analysis, firstly we have to classify the variables of interest into dependent variable and independent/explanatory variable. The variable which is hypothesized to have influence on another
Recent posts

NORMAL DISTRIBUTION

Statistics always deals with variability. There will be no statistics without variability. In our daily life, variability is ubiquitous. Today, we are going to study a kind of variability widely found in measurements. Let's see the case of manufacturing bottled water. Suppose that the labels on the bottles say that the volumes are 220 cc. Does it mean that all the bottles contain exactly 220 cc of water? In fact, if we measure the volumes accurately with a sophisticated measurement device, not all of them has the volume of 220 cc, but the volumes are around 220 cc. Some of them have the volume of 219 cc, some have the volume of 222 cc, 218, 219.5, etc. Statistically speaking, the volumes of water in the bottles have a mean, and since the volumes vary, they have a standard deviation. Now, can we estimate the percentage of bottled water whose volume is less than 217 cc? This is what this post addresses. If we know that the volume is normally distributed with a certain mean and st

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

There are lots of uncertainties in our life: monthly revenues of a business, the number of vehicles left in a parking lot everyday, the number of phone calls we get everyday, the money we spent on watching movies every year, names of customers who will enter the queue for the teller, the colours of the cars which are going to enter a toll gate in the next one hour, whether or not the next flight is going to be late, etc.  The first three are random variables, but other three are not. Among the things that are uncertain, some are random variables, some are not.  So, what characterizes a random variable? Let’s see some definitions. A random variable is a function that associates a real number with each element in the sample space. (Walpole, 1993) A random variable is a function associated with an experiment whose values are real numbers and their occurence in the trials depends on chance. (Kreyszig, 1993) First, the value of a random variable should be real numbers. Names ar

SAMPLE PROBLEMS ON THEORETICAL PROBABILITIES

Problems Set I: The Probability of a Single Event Sample Problem #1 An experiment consists of tossing 4 coins simultaneously, once. Find the probability that at least two heads (H) appear. Answer The sample space is S = {HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, THHT, HTTH, THTH, TTHH, TTTH, TTHT, THTT, HTTT, TTTT}. $\mid S \mid = 16$ The event is E = {HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, THHT, HTTH, THTH, TTHH} $\mid E \mid = 11$ $P(E) = \frac{\mid E \mid}{\mid S \mid} = \frac{11}{16} = 0.6875$ So, the probability that at least two heads appear is 0.6875. Sample Problem #2 Fifteen cards are numbered from 1 to 15. The experiment consists of picking at random a card from the set of cards. Find the probability of getting a card with a number which is a multiply of 3. Answer The sample space is S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} $\mid S \mid = 15$ The event is E  = {3, 6, 9, 12, 15} $\mid E \mid = 5$ $P(E) = \frac{\mid E \mid}{\mid S \mid} = \frac{

EXPERIMENT CONCERNING BERNOULLI-LIKE PROCESS

The Bernoulli process must possess the following properties: 1. The experiment consists of n repeated trials.  [Each trial is called a Bernoulli trial.] 2. Each trial results in an outcome that may be classified as a success or a failure. 3. The probability of success, denoted by p, remains constant from trial to trial. 4. The repeated trials are independent. Now we are going to do an experiment approximating to a Bernoulli process, i.e. Bernoulli-like process. (If you have a deep understanding about the Bernoulli process, you know why the experiment is not an exact Bernoulli process, but it is only an approximation to the process.) We are going to do 20 Bernoulli-like process. In each process, there are 4 Bernoulli trials (n = 4). Ideally, the same single dice is rolled 4 times by the same person. But for the sake of time efficiency, instead of rolling a dice 4 times in each process, each of the 4 students in a group roll a dice simultaneously. Let’s define that a success o

CONSTRUCTING THE FREQUENCY DISTRIBUTION TABLE

The frequency distribution table is a table that divides data into groups (classes) and shows how many data values occur in each group/class. Below is an example of frequency distribution table. Now we are learning how to create a frequency distribution table. Suppose we have a collection of ungrouped data on last year’s advertising expenditures of 40 logistics companies, recorded in millions Rupiahs. To construct a frequency distribution table of the ungrouped data, apply the following steps. Step 1: Find the range of the data The range (R) is defined as the difference between the largest data and the smallest data. In this case, R = 307 - 242 = 65. Step 2: Determine the number of categories/classes (k) Applying Sturges rule (k = 1 + 3,322 log n, where n = the number of data), we have: $k = 1 + 3.322 \: log \: 40 \approx 6.32$ As the value of k must be a natural number, 6.32 is rounded up to 7, so k = 7. Step 3: Determine the class width (c) To find c, use $

THE QUARTILES AND MEDIAN OF GROUPED DATA

In this post,  we will learn how to determine the quartiles when some quantitative data are presented in a frequency distribution table. For example, we have the following data, showing Flesch Readability Score of 80 monthly bulletin articles published by Britt and Co. Ltd. Find the quartiles of these readability scores. To answer this, first augment the table with a new column to the right of the frequency column, namely Data Numbers column. There are 5 data in the first class, so the class contains data no. 1 to  no. 5. There are 7 data in the second class, so the class contains data no. 6 to no. 12. There are 13 data in the third class, so the class contains data no. 13 to no. 25. Continuing this way, we get the following: In this case, finding the first quartile means finding the  20 th  data, after the data have been ordered from the smallest to the highest (20 = ¼ x 80). Note that the  20 th  data is in the third class (20 is in the range of 13 - 25, as s