TechBookReport logo
 

Quantiles In 30 Seconds

Or Percentiles for Dummies...



Quartiles, deciles and percentiles (which are all examples of quantiles) are standard descriptive statistics which are used to divide a set of data points into equally sized subsets. Quartiles divide the sample into four groups, with the lower quartile being 25%, the median value being at 50% and the upper quartile at 75%. In the same way, the median value corresponds to the 50th percentile, and the lower decile is equivalent to the 10th percentile. Quartiles are essentially ranking mechanisms.

The 90th percentile is that position in a data set which has 90% of data points below it, and 10% above it. The upper quartile is that position in the data set which has 75% of values below it and 25% above it. While percentiles and quartiles are standard terms in statistics, there is no universally agreed definition of how to calculate them.

The starting point for finding a quantile value is to rank the data set so that they are placed in numerical order. For example if our data set is:

{4.5, 4, 6, 7.8, 9, 9.4, 3.6, 5.9, 7.1, 6.5, 8.1, 5.7, 4.5, 9.2, 8}

then we have to rank them in strict numerical sequence:
{3.6, 4, 4.5, 4.5, 5.7, 5.9, 6, 6.5, 7.1, 7.8, 8, 8.1, 9, 9.2, 9.4}

If we want to find the value for the 90th percentile, we need to work out the position in the ordered sequence that it corresponds to. We can apply the following formula:

k = p(n+1)/100

where p is the percentile and n is the number of data points.

Applying the above formula to our sample of 15 values gives us:

k = 90(15+1)/100 = 14.4

Obviously there isn't a 14.4th number in the sequence! At this point there are a number of alternative ways of calculating the result.

Method 1:

The simplest option is to round down to the nearest integer (whole number), which in this case is 14. So, the 90th percentile corresponds to the value of the 14th data point. Reading across the ordered list that gives us the value 9.2.

Method 2:

An alternative approach is not to round that value of 14.4, but to take the values on either side of it (the values in position 14 and 15) and derive some intermediate value from those. The simplest way of doing this is take the simple average of the two values, so in our example the 90th percentile would be 9.3 (the mean of 9.2 and 9.4).

Method 3:

However, a more common approach is to do a 'linear interpolation' - in other words we assume that there's a straight line between the two values and calculate a point on it accordingly. So, for example if the value of k is 14.9, then the value will be closer to the 15th value than the 14th.

We calculate this value by finding the difference between the two values on either side of k, multiplying this by the fractional part of k (the bit after the decimal point) and then adding the result to the lower value. Going back to our example, we have a value of k=14.4, which sits between the 14th and 15th values from the sample. The fractional part of k is therefore 0.4, and the values in the 14th and 15th position are 9.2 and 9.4 respectively. We therefore apply the formula:

9.2 + (0.4(9.4 - 9.2)) = 9.2 + 0.08 = 9.28

So using this method the 90th percentile value is 9.28, which in this case wasn't that far removed from the rough and ready calculation of 9.3. But in cases where the gap between values on either side of k is very large, the different methods of calculating the percentile value can give very different results.

Conclusions:

Note that these different methods of calculating percentiles are mirrored in software - different applications, such as Excel, OpenOffice.org or Gnumeric may give different results for the same data set. In general the bigger the size of the dataset the more likely that you'll get similar results, with smaller datasets there is greater variation in the results.

Return to home page

Contents copyright of Pan Pantziarka. If you like it link it, don't lift it. No copying for commercial use allowed. Site © 2007.