Taken from here.

A manufacturing company keeps records of the numbers of defective items it produces per day. A random sample of days was selected. From their records, the company calculated the proportion of defective items produced per day. The frequency distribution of of proportions defective is as follows:

 Proportion Defective Number of Days 0 – 1% 66 1 – 2% 44 2 – 3% 32 3 – 4% 19 4 – 5% 8 5 – 6% 5 6 – 8% 4 8% or more 2

### (a)

Draw a histogram for the distribution on proportions defective. Comment on its shape.

The histogram is skewed to the left towards 0.

### (b)

Explain how you determined the heights of the bars for the last two frequency classes in preparing the histogram.

A histogram is constructed such that the area under each bar is equal to the number in the class interval. For the 6 – 8% class interval the base was 2 and the number in the class interval was 4 so we wanted height $\times$ base = 2; i.e. the base to equal 2. In the 8 – 100% the height is effectively zero — by the same analysis we have height =$2/98$.

### (c)

Provide a numerical measure to describe the proportions defective. Justify your choice of numerical measure.

Using the formula for the mean, $\bar{x}$, of a frequency distribution:

$\bar{x}=\frac{\sum_{i}f_ix_i^*}{\sum_{i}f_i}$,

where the sum is over all class intervals where $f_i$ is the number of elements in the $i$th class interval and $x_i^*$ is the value of the midpoint of the $i$th class interval.

We will however crop the 8 – 100% class to 8 – 10% to prevent a massive bias here. This will improve our calculation because any data above 10% is clearly an outlier (which is bad for the mean — see below).

Thus we get

$\bar{x}=1.972\%$.

We use the mean as there are not too many outliers and while the data is skewed, the data is relatively spread out.

### (d)

Calculate the first quartile and interpret it’s value. State any assumptions you make. Assess whether these assumptions are valid.

There are 180 days in total so one quarter is 45. Now the lowest 45 are in the class interval 0 – 1% which actually includes 66 elements. Hence we look to take just $45/66\approx0.682$ of the first bar. We do this as follows:

So the first quarter lies between 0 and 0.682%… therefore the first quartile is 0 — 0.682%. The interpretation is that the 25% best days have less than 0.682% defections.

To do this we assume that the distribution is uniform across the class interval 0 – 1%.

While this answer may well be accurate we can’t be too sure — particularly when the model suggest there are days with very little defections. For example, this assumption demands that there is a day when there are 0.015% defections; i.e. 1 in 6,666 — but does the manufacturer ever make this many a day?

### (e)

The manufacturing company operates a policy which specifies that if the proportion defective exceeds 4.75% on any given day, an investigation of the manufacturing process must be undertaken. Estimate the percentage of days on which such an investigation would be undertaken.

Looking at the histogram there were 5+4+2=11 days when the proportion defective exceeded 5%. Now the 4.75 – 5% class sub-interval corresponds to 1/4 of the 4 – 5% class interval — that is 2 days. Hence we have 11+2=13/180 days when the proportion defective exceeded 4.75%. 13/180 translates to an estimation of 7.2% of days of proportions defective exceeding 4.75%.

### (f)

The manufacturing company reviewed the policy referred to in part (e). They want to change the cut-off for the proportion defective that necessitates the investigation of the manufacturing process (currently 4.75%). Estimate the cut-off that should be used to ensure that the percentage of days on which the investigations would be undertaken is at most 5%.

We want to adjust the 7.2% above to 5%. 5% of 180 days is nine days. So which were the nine worst days? We have six days worse than 6% and then we want to take 3 of the five bad days in 5 -6% class interval to make up the worst nine days. Now we want to take, therefore, the class sub-interval 5.4 – 6%, which is 3/5 of that interval. Hence the new threshold is 5.4% (which estimates an investigation on 5% of days).